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Foreword 


This book is another example of CIMMYT's commitment to scientific advancement, 
and comprehensive and inclusive knowledge sharing and dissemination. By using 
alternative statistical models and methods to describe and analyze data sets from 
different disciplines, such as biology and agriculture, the book facilitates the adop- 
tion and effective use of these tools by publicly funded researchers and practitioners 
of national agricultural research extension systems (NARES) and universities across 
the Global South. 

The authors aim to offer different and new models, methods, and techniques to 
agricultural scientists who often lack the resources to adopt these tools or face 
practical constraints when analyzing different types of data. 

This work would not be possible with the continuous support of CIMMYT’s 
outstanding partners and donors who invest in non-profit frontier research for the 
benefit of millions of farmers and low-income communities worldwide. For that 
reason, it could not be more fitting for this book to be published as an open access 
resource for the international community to benefit from. I trust that this publication 
will greatly contribute to accelerate the development and deployment of resource- 
efficient and nutritious crops for a food secure future. 


Bram Govaerts 
Director General, CIMMYT 
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Сһар{ег 1 
Elements of Generalized Linear Mixed ktm 
Models 


1.1 Introduction to Linear Models 


Linear models are commonly used to describe and analyze datasets from different 
research areas, such as biological, agricultural, social, and so on. A linear model aims 
to best represent/describe the nature of a dataset. A model is usually made up of 
factors or a series of factors that can be nominal or discrete variables (sex, year, etc.) 
or continuous variables (age, height, etc.), which have an effect on the observed data. 
Linear models are the most commonly used statistical models for estimating and 
predicting a response based on a set of observations. 

Linear models get their name because they are linear in the model parameters. 
The general form of a linear model is given by 


у= ХВ +е (1.1) 


where у is the vector of dimension n x 1 observed responses, X is the design matrix 
of n x (p + 1) fixed constants, f is the vector of (р + 1) х 1 parameters to be 
estimated (unknown), and е is the vector of n x 1 random errors. Linearity arises 
because the mean response of vector y is linear to the vector of unknown parameters 
В. Mathematically, this is demonstrated by obtaining the first derivative of the 
predictor with respect to Й, and, if after derivation it is still a function of any of the 
beta parameters, then the model is said to be nonlinear; otherwise, it is a linear 
model. In this case, the derivative of the predictor (1.1) with respect to beta is equal 
to X, so, mathematically, the model in (1.1) is linear, since after derivation, the 
predictor no longer depends on the f parameters. 

Several models used in statistics are examples of the general linear model 
y = Xf + e. These include regression models and analysis of variance (ANOVA) 
models. Regression models generally refer to those in which the design matrix X is 


© The Author(s) 2023 1 
J. Salinas Ruiz et al., Generalized Linear Mixed Models with Applications 
in Agriculture and Biology, https://doi.org/10.1007/978-3-031-32800-8 1 


2 1 Elements of Generalized Linear Mixed Models 


of a full column rank, whereas in analysis of variance models, the design matrix X is 
not of a full column rank. Some linear models are briefly described in the following 
sections. 


12 Regression Models 


Linear models are often used to model the relationship between a variable, known as 
the response or dependent variable, y, and one or more predictors, known as 


independent or explanatory variables, Ху, X2, >, Xp. 


12.1 Simple Linear Regression 


Consider a model in which a response variable y 15 linearly related to an explanatory 
variable X, via 


у= fo + PX + €i 


where e; are uncorrelated random errors (í = 1,2,:::,2) which are commonly 
assumed to be normally distributed with mean 0 and variance constant o? > 0, 
e; ~ N(0, o?) If X11, Ху, 77, Хи are constant (fixed), then this is a general linear 
model у = Xf + Е where 


У լ Хи Շլ 

x 
Уһх1 5 e » Xnx2= B 5 ba (№). Єх = ез 
Yn 1 Xin En 


Example Let us consider the relationship between the performance test scores and 
tissue concentration of lysergic acid diethylamide commonly known as LSD (from 
German Lysergsdure-diethylamid) in a group of volunteers who received the drug 


Table 1.1 Average Tissue concentration of LSD Mathematical average 
mathematical test а= | C = s уу Հ» = 
scores and LSD tissue 
concentrations 2.97 58.20 

3.26 67.47 

4.69 37.47 

5.83 45.65 

6.00 32.92 

6.41 29.97 
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Table 1.2 Results of the (a) Type III tests of fixed effects 
Simple: regression analysis Effect Num DF Den DF F-value Pr >F 


Conc 1 35.93 0.0019 


(b) Parameter estimates 


Standard 
Effect Estimate | error DF | t-value | Pr > Id 
Intercept | 89.1239 7.0475 5 12.65 | <0.0001 
Conc —9.0095 1.5031 5 —5.99 | 0.0019 


Scale 50.7763 32.1137 


աջ 


(Wagner et al. 1968). The average scores оп the mathematical test and the LSD 
tissue concentrations are shown in Table 1.1. 


The components of this regression model are as follows: 


Distribution: y; ~ N (т, с?) 
Linear predictor: g; = у + f, х Conc; 


Link function: и; = n, (identity) 


The syntax for performing a simple linear regression using the GLIMMIX 
procedure in Statistical Analysis Software (SAS) is as follows: 


proc glimmix; 
mode1 y= X1/solution; 
run; 


Part of the results is shown in Table 1.2. The analysis of variance (item a) 
indicates that drug concentration has a significant effect on average mathematical 
performance (P = 0.0019). The estimates of the regression model parameters (item b) 
are Во and fı, and the mean squared error (MSE scale) is shown in Table 1.2(b) 
under “Parameter estimates.” 

With these results, the linear predictor (7,) that predicts the average mathematical 
performance as a function of LSD concentration is as follows: 


ў, = 89.124 — 9.01 x Conc; 


This means that we can predict the average mathematical performance of an 
individual for whom we need to know the LSD concentration (Conc;) to be applied. 
From the estimated parameters, we can say that there is a negative relationship 
between LSD concentration and mathematical score. Figure 1.1 clearly shows that 
an increase in drug supply has a negative effect on the mathematical score of the 
youth. This fitted model explains 87.796 of the variability in the data (Fig. 1.1). 
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70 
y = 89.124 -9.0095*Conc 
o 60 В? = 0.8778 
[e] 
я 50 
>, ө 
© 40 
Ф 
> 
< 30 


1 2 3 4 5 6 / 


Concentration (LSD) 


Fig.1.1 Relationship between applied drug concentration and the mathematical score of the youth 


Adjusted model of the relationship between the average score and LSD 
concentration. 


12.2 Multiple Linear Regression 


Suppose that a response variable y is linearly related to several independent variables 
Xj, X2, ^, Xp such that 


Yi =Po + В.Ха + Хо + + BLXip + €i 


for i = 1, 2, +++, п. Here, e; are uncorrelated random errors (i = 1, 2, +++, n) normally 
distributed with a zero mean and constant variance o°, i.e., е; ~ МО, o?) If the 
explanatory variables are fixed constants, then the above model belongs to a general 
linear model of the form у = Xf + Ք, as can be seen below: 


yi 1 Хи Xm" Хр մ 
Ynx1= _ > Хах) = Т” աի и » Ба р, , 
y, i ու Xa X, 8, 
€1 
€nx1 = E 
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Table1.3 Body weight (kilograms) and its relationship with circumference (centimeters) and heart 
length (centimeters) of seven young bulls 


Bull 1 2 3 4 5 6 7 

Weight (kilograms) 480 450 480 500 520 510 500 
Circumference (centimeters) 175 177 178 175 186 183 185 
Length (centimeters) 128 122 124 131 131 130 124 


A regression analysis can be used to assess the relationship between explanatory 
variables and the response variable. It is also a useful tool for predicting future 
observations or simply describing the structure of the data. 


Example Let us to fit a regression model of the relationship between body weight 
and heart girth and length of the hearts of seven young bulls from the data shown in 
Table 1.3. 


The components of this multiple regression model are as follows: 


Distribution: y; ~ N (nis о?) 
Linear predictor: n; = Во + Xf, + Хо)» 
Link function: и; = n,(identity) 


The syntax for performing a multiple regression using the GLIMMIX procedure 
in SAS, assuming that there is no interaction between bull heart girth (X1) and length 
(X2), 1s shown below: 


proc glimmix; 
model y = X1 X2/solution cl; 
run; 


Based оп the regression model specifications, the option “solution cl" prompts 
GLIMMIX to provide the value of the estimated parameters and their respective 
confidence intervals. Other useful options available are уре = 1, 2, and 3," which 
refer to the sum of squares of types I, IL, and III. The type Ш fixed effects tests in 
(a) of Table 1.4 indicate that there is a linear relationship between heart length (size) 
and weight in young bulls. The estimated parameters with their respective confi- 


dence intervals (Bo. В, А 2) as well as the MSE (scale) of the fitted regression model 
are listed below in (b). 
Note that in a linear model, the parameters are linearly entered, but the variables 


do not necessarily have to be linear. For example, consider the following two 
examples: 


У; = Во + В.Ха + В108(Хо) + + BX ik + 6i 
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Table 1.4 Results of the multiple regression analysis 


(a) Type III tests of fixed effects 
Effect Num DF Den DF F-value Рг> Е 


XI 1 442 0.1034 
X2 1 9.51 0.0368 
(b) Parameter estimates 

Effect Estimate | Standard error |DF |т-уаше |Pr» М |a Lower Upper 
Intercept 225.87 —1122.13 |132.10 
ХІ 1.0739 —0.7243 5.2388 


X2 1.4855 0.4564 8.7053 
Scale 98.6518 


у= бо + Х + ВХ + + Bv exp(Xa) + е 


The first example is a linear model, whereas the second one is not, since its 
derivatives do not depend on the beta coefficients, with the exception of the term x 
whose derivative is equal to x? log(X;). This clearly shows that the second 
example is a nonlinear model because the derivative of the predictor depends on р1. 


13 Analysis of Variance Models 


1.3.1 One-Way Analysis of Variance 


Consider an experiment in which you want to test t treatments (1 > 2), to the level of 
the ith treatment with n; experimental units that are selected and randomly assigned 
to the ith treatment. The model describing this experiment is as follows: 


ур=и + ti + еу 


Іюшгі-1,2,:-,іапа/--1,2,--,п,. Неге, е; are the uncorrelated random errors with 

normal distribution with a zero mean and a variance constant o? (e; ~ МО, o). If the 

treatment effects are considered as fixed constants (drawn from a finite number), then 

this model is a special case of the general linear model (1), with the total number of 
t 


experimental units n = Ут. 
і 
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22 15 lt Bacteria A Bacteria B Bacteria C 
production of the o -——€—  —  — EQ = —_ 
three types of bacteria ш 0 2 
15 19 35 
9 23 42 
Table 1.6 Analysis of Sources of variation Degrees of freedom 
variance Е 
Bacteria type і-1-3-1-2 
Error tr—1)=3x2=6 
Total tr—-1=3x3-1=8 


In matrix terms, the information under this design of experiment is equal to: 


Yu li 1, Oni UT On, H 

844 T 

а Ур x = 1,, 0,, 1,, 0,, = 1 

Уһх1 = : > Anx(t+1) = : : : 2 : > Вла = т |» 

Yin, 1, 0, 0, Mi 1,, f, 
еп 
E12 
Enx1 = 613 
Etn, 


where 1,, is the vector of ones of order n; and 0,, is the vector of zeros of order n;. 
Note that the matrix X,,x(¢41) is not of a full column rank because its first column can 
be obtained as a linear combination of its remaining columns. 


Example Assume that measurements of the biomass produced by three different 
types of bacteria are collected in three separate Petri dishes (replicates) in a glucose 
broth culture medium for each bacterium (Table 1.5). 


The sources of variation and degrees of freedom (DFs) for this experiment are 
shown in Table 1.6. 

The components for this one-way model, assuming that each of the response 
variable y;; is normally distributed, are as follows: 


Distribution: у; ~ N (м, o?) 
Linear predictor: q; = a + т; 
Link function: и, = 7;(identity) 
where у, is the response observed at the jth repetition in the ith bacterium, 7; is the 


linear predictor, a is the intercept (the grand mean), and ç; is the fixed effect due to 
the type of bacterium. 
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Table 1.7 Results of the (a) Fit statistics 

one-way analysis of variance yee 2 Res log likelihood 3336 
AIC (Akaike information criterion) (smaller is better) 41.36 
AICC (Corrected Akaike information criterion) (smaller | 81.36 
is better) 
BIC (Bayesian information criterion) (smaller is better) | 40.52 
CAIC (Consistent Akaike's information criterion) 44,52 
(smaller is better) 
HQIC (Hannan and Quinn information criterion) 38.02 
(smaller is better) 
Реагвоп 5 chi-square 52.67 
Pearson’s chi-square / DF 8.78 
(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
Bacteria 64.95 <0.0001 


The SAS syntax for a one-way analysis of variance (ANOVA) is as follows: 


proc glimmix data=biomass; 
class bacteria; 

model y = bacteria; 

lsmeans bacteria/lines; 
run; 


Similar to “proc glm" or “proc mixed,” the “class” command allows to define the 
type of class variables (categorical or nominal) to be included in the model; in this 
case, for the class variable “bacteria,” the “model” command allows to declare (list) 
the response variable “у” and all the class or continuous variables that enter the 
model, whereas the *Ismeans" command asks GLIMMIX to estimate the means of 
the treatments and the "lines" option allows to make a comparison of means. Part of 
the results is presented below. 

By default, “proc GLIMMIX" provides the fit statistics (information criteria), 
which are extremely useful for comparing or choosing a model that explains the 
largest possible proportion of variation present in a dataset, i.e., the best-fit model 
(part (a) of Table 1.7). The statistic "—2 res log likelihood” is most useful when 
comparing nested models, and the rest of the statistics is useful for comparing 
models that are not necessarily nested. The mean squared error (MSE) in GLIMMIX 
is given as the statistic “Pearson s chi — square/DF." In this analysis, this value is 
8.78. (5: = МЅЕ = 8.78). In part (b), ће analysis of уапапсе indicates that at least 
one type of bacterium produces a different biomass (P < 0.0001). That is, the null 
hypothesis is rejected (Ho : тд = тв = Тс) at a significance level of 5%. 

The estimated least squares (LS) means obtained with “Ismeans” are tabulated 
under the "Estimate" column with their standard errors in the “Standard error" 
column of Table 1.8. These estimated means were obtained (by default) with 
Fisher’s LSD (least significant difference). 
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Table 1.8 Means and estimated standard errors of the one-way model 


Least squares means of bacteria 


Bacteria Estimate Standard error DF t-value Pr > Id 

A 12.0000 1.7105 7.02 0.0004 
B 20.6667 1.7105 12.08 <0.0001 
C 39.0000 1.7105 22.80 <0.0001 


Table 1.9 Comparison of the 


^ T grouping of the least squares means of bacteria (a — 0.05) 
means (LSD) in the one-way 


LS means with the same letter are not significantly different 


model - - 
Bacteria Estimate 
C 39.0000 A 
B 20.6667 B 
A 12.0000 C 


Finally, Table 1.9 presents a comparison of the means obtained with "lines" and 
indicates that bacteria type C has a better fermentative conversion of glucose to lactic 
acid compared to bacteria types B and A. Equal letters per column indicate that they 
are statistically equal. 


13.2 Two-Way Nested Analysis of Variance 


Let us consider an experiment with two factors, A and B, in which each level of B is 
nested within a level of factor A, that is, each level of factor B appears within a level 
of factor A. Then, the model that describes this experiment is as follows: 


Ук =H + Gi + Bj + Eijk 


Ёогі = 1,2, al j= 1, 2, +, Б; and К = 1,2, nj. In this model, и is the overall 
mean, a; represents the effect due to the ith level of factor A, and дуг) represents the 
effect of the jth level of factor B nested within the ith level of factor A. Assuming 
that all factors are fixed, and that the errors є are normally distributed, that is 
є МО, o^), this model is the general linear model of the form y = Xf + e. For 
example, suppose that you have a — 3 levels of factor A, b — 2 levels of factor B, and 
nj; — 2, then the vectors and matrices have the following form: 


10 1 Elements of Generalized Linear Mixed Models 


Xu 1100100000 
Vu 1100100000 
Ym 11000100020 
У122 1100010000 
Ул 1 0 1 0 0 0 1 0 0 0 
3212 1 0 1 0 0 0 1 0 0 0 
IS] ya ЕЗЗІл 3 1 0 9 о о 1 о D]: 
У222 1 0 1 0 0 0 0 1 0 0 
331 1 0 0 1 0 0 0 0 1 0 
3512 1 0 0 1 0 0 0 0 1 0 
332 1 0 0 1 0 0 0 0 0 1 
3322 1 0 0 1 0 0 0 0 0 1 
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i €112 
ՀՎ 6121 
ճշ €122 
Па өп 
p=|5° |, «=| ег 
12 €221 
Во 6222 
Bo €311 
Ёз 6312 
Be 6321 
€322 


Example Suppose that a researcher was studying the assimilation of fluorescently 
labeled proteins in rat kidneys and wanted to know whether his two technicians, 
technician A and technician B, were performing the procedure consistently. Tech- 
nician A randomly chose three rats, and technician B randomly chose three other 
rats, and each technician measured the protein assimilation in each rat. Since rats are 
expensive and measurements are cheap, both technicians measured protein assimi- 
lation at various random locations in the kidneys of each rat (Table 1.10). 


When performing a nested ANOVA, we are often interested in testing the null 
hypothesis (H, : тд = тв ). As in this example, we do not wish to test whether the 
subgroups (rats within technicians) are significantly different, since the goal is to 
prove that both technicians are performing their jobs adequately. The sources of 
variation and degrees of freedom are shown in Table 1.11. 

The components of this two-way model, assuming that the response variable y;; is 
normally distributed, are as follows: 
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Table 1.10 Levels of protein Technician А 


Technician B 


assimilation in the rat kidneys 


measured by both technicians Bad Као 


Каз 


Rat4 Rat5 Rat6 


1.119 1.045 0.9873 1.3883 1.3952 1.2574 
1.2996 1.1418 0.9873 1.104 0.9714 1.0295 
1.5407 1.2569 |0.8714 1.1581 1.3972 1.1941 
1.5084 0.6191 0.0452 1.319 1.5369 1.0759 
1.6181 1.4823 1.1186 1.1803 1.3727 1.3249 
1.5962 |0.8991 1.2909 |0.8738 1.2909 | 0.9494 
1.2617 0.8365 1.1502 1.387 1.1874 1.1041 
1.2288 1.2898 1.1635 1.301 1.1374 1.1575 
1.3471 1.1821 1151 1.3925 1.0647 1.294 

1.0206 |0.9177 0.9367 1.0832 0.9486 1.4543 


Table 1.11 Sources of 
variation and degrees 


Sources of variation 


Degrees of freedom 
а-і-2-і1-і 
alb — 1)=2(3—1)=4 


ab(r — 1) = 2 x 3(10 — 1) = 54 


of freedom of the two-way Technician 
nested design Rat (technical) 
Error 
Total 


abr —1=2x3x10—1=59 


Distribution: y; ~ М(ш), o?) 


Linear predictor: Ny = @ + ti + В(т) 5 


Link function: и; = n; (identity) 


where y;; is the level of assimilation of the fluorescent protein obtained from rat j by 
technician i, z is the intercept, z; is the fixed effect due to the technician, and /(т)/)18 


the nested effect of rat j within technician i. 


The SAS commands for the main effects of factor A and factor B nested within A 


are as follows: 


proc glimmix data=rata nobound; 
class technician rat rep; 


model protein=technical rat (technical); 
lsmeans technician rat (technician) /lines; 


run; 


Part of the results is shown in Table 1.12. The results indicate that there is minimum 
variability of the technicians since the value of the mean squared error 
(Pearson s chi — square/DF) is 0.04 (part (a)). This means that the variance between 
group means is smaller than would be expected. The analysis of variance in part 
(b) indicates that there is no difference in the measurement of fluorescent proteins in 
the rats between technicians (P = 0.3065). Since there is variation between rats in the 
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Table 1.12 Fit statistics of 
the two-way nested design 
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(a) Fit statistics 


—2 Res log likelihood —12.39 
AIC (smaller is better) 1.61 
AICC (smaller is better) 4.04 
BIC (smaller is better) 15.53 
CAIC (smaller is better) 22.53 
HQIC (smaller is better) 6.98 
Pearson’s chi-square 1.95 
Pearson’s chi-square / DF 0.04 


(b) Type III tests of fixed effects 


Effect F-value 
Technician 
Rat (technical) 3.98 0.0067 
Table 1.13 Comparison of the means (LSD) in the nested model 
(a) Technical least squares means 
Technician Estimate Standard error DF t-value Pr > Id 
A 1.2110 0.03466 34.94 <0.0001 
B 1.1604 0.03466 33.48 <0.0001 


(b) T grouping of the technical least squares means (а = 0.05) 


LS means with the same letter are not significantly different 


Technician Estimate 

A 1.2110 А 
А 

B 1.1604 A 


average protein uptake, it is to be expected that between rats within technicians, there 


are mean differences in the protein uptake (P = 0.0067). 


In Table 1.13 part (a), the values of the least squares means tabulated under the 
“Estimate” column are shown with their respective “Standard errors.” It can be seen 
that rats under technician А have statistically the same mean protein uptake as do rats 
under technician B (part (b)). 

Comparison of means for rat subgroups under both technicians showed similar 
means for rats under technician А but different means for rats under technician B 
(part (a) and (b), Table 1.14). 
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Table 1.14 Comparison of the means (LSD) of the subgroups nested within technicians 


(a) Least squares means of rats (technical) 


Technician Rat Estimate Standard error DF t-value Pr > Id 

A 5 1.2187 0.06003 20.30 <0.0001 
А 1.2302 0.06003 20.49 <0.0001 
А 1.1841 0.06003 19.72 <0.0001 
В 1 1.3540 0.06003 22.56 <0.0001 
В 1.0670 0.06003 17.77 <0.0001 
В 1.0602. 0.06003 17.66 <0.0001 


(b) T grouping of the least squares means (а = 0.05) of rats (technical) 


LS means with the same letter are not significantly different 


Technician 


Estimate 


1.3540 


1.2302 


1.2187 


1.1841 


>|>|>|> 


1.0670 


B 
A 
A 
A 
B 
B 


1.0602 


= = m | m= | = 


13.3 Two-Way Analysis of Variance with Interaction 


This experiment is used when one wishes to test two factors А and B, with a levels of 
factor A and b levels of factor B. In this experiment, both factors are crossed, this 
means that each level of А occurs in combination with each level of factor B. The 


model with interaction 15 given by: 


for i = 1, 2, аа] = 12, b; k = 1, 2, ո 


Уд = И + i + Pij + Y; + 6k 


and ej, ~ №, o°). If all the 


parameters of the model are fixed, then this model can be expressed as y = Xf + e. 
For this model with a = 3, b = 2, and nj; = 3, the matrix expression has the form: 
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աղ 1 1 0 0 1 0 1 0 0 0 0 0 
Vio 1 1 0 0 1 0 1 0 0 0 0 0 
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3122 1 1 0 0 0 1 O 1 0 0 0 0 
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Example This experiment consisted of developing ап іп vitro efficacy test for self- 
tanning formulations. Two brands, 1 = erythrulose, 2 = dihydroxyacetone (factor 
А), and three formulations, 1 = solution, 2 = gel, and 3 = cream (factor В), were 
tested with four replicates for each condition according to Jermann et al. (2001). 
Total color change was measured for each of the combination conditions. The 
dataset is shown in Table 1.15. 
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Table 1.15 Color change (Y) in each of the brands and formulations 

Brand Formulation y Brand Formulation Ү 
1 1 16.79 1 32.85 
1 1 12.68 1 38.08 
1 1 12.47 1 30.25 
1 1 11.67 1 28.41 
1 10.23 25.06 
1 10.29 21.66 
1 8.97 19.86 
1 8.51 18.62 
1 9.43 25.89 
1 9.45 22.96 
1 8.86 24.55 
1 8.66 24.59 


Table 1.16 Analysis of уагі- 


ance of the two-way model 


with interaction 


Sources of variation 


Brand 


Degrees of freedom 
4—122-121 


Formulation 


a-1=3-1=2 


Brand x formulation 


(а= 0 = D =1х2=2 


Error 


ab(r = 1) =2 x3 x 3 = 18 


Total 


abr—1=2x3x4-—1=23 


For this two-way model, assuming that the response variable у; has a normal 


distribution, the components are as follows: 


Distribution: уж ~ Ми» c?) 


Linear predictor: nj — и + a; + P; + yj 


Link function: y; = դչ (identity) 


where у; is the color change observed at the kth repetition at the ith level of factor A 
and at the jth level of factor B, и is the intercept (the overall mean), o; is the fixed 
effect due to the level of factor A (mark), f; represents the fixed effect of the level of 
factor B (type of formulation), and y; is the fixed effect due to the interaction 
between the brand and formulation. Table 1.16 shows the sources of variation and 
degrees of freedom. 

The following code in GLIMMIX in SAS allows us to estimate the main effects 
and the interaction: 


procglimmix; 
class brand formulation; 
model у= brand|formulacion; 


lsmeans brand | formulacion/lines; 


run; 
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Table 1.17 Results of the analysis of variance of the two-way model with interaction 


(a) Fit statistics 

—2 Res log likelihood 
AIC (smaller is better) 
AICC (smaller is better) 
BIC (smaller is better) 


90.20 
104.20 
115.40 
110.43 


CAIC (smaller is better) 117.43 
HQIC (smaller is better) 105.06 
Pearson’s chi-square 99.61 
Pearson’s chi-square / DF 5.53 
(b) Type Ш tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Brand 1 257.04 <0.0001 
Formulation 22.99 <0.0001 
Brand x formulation 4.68 0.0231 


Table 1.18 Means and standard errors of the tanning brand 


Least squares means of the brand 


Brand Estimate Standard error DF t-value Pr > | 
1 10.6675 0.6791 15.71 <0.0001 
26.0650 0.6791 38.38 <0.0001 


T grouping of the least squares means (а =0.05) 


LS means with the same letter are not significantly different 


Brand Estimate 
26.0650 A 
1 10.6675 B 


Part of the results is shown below. Of all the fit statistics in (a) of Table 1.17, the 
value that we are interested in highlighting in this analysis is “Pearson's chi — 
square/DF,” which corresponds to the mean squared error (MSE), even though we 
are evaluating different possible models for this given dataset. The value of the MSE 
is 5.53. The type III fixed effects tests, in part (b) of Table 1.17, indicate that the type 
of brand (P < 0.0001), formulation (P < 0.0001), and the interaction between both 
factors (P = 0.0231) all have a significant effect on the change of self-tanning color. 

The least mean squares obtained with “Ismeans” аге shown in the Table 1.18 for 
the levels of tanning brand factor in Table 1.19 for the levels of tanning brand 
formulation and in Table 1.20 for the interaction of both factors. The “lines” option 
allows us to make a comparison of means using the LSD method. 

The least squares means for the tanning brand factor are given in Table 1.18. 

The least squares means for the type of tanning brand formulation are given in 
Table 1.19. 
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Table 1.19 Means and standard errors of the tanning brand formulation 


Least squares means for the tanning brand formulation 


Formulation Estimate Standard error DF t-value Pr > Id 

1 22.9000 0.8317 27.53 <0.0001 
15.4000 0.8317 18.52 <0.0001 
16.7988 0.8317 20.20 <0.0001 


T grouping of the least squares means (a = 0.05) of the tanning brand formulation 
LS means with the same letter are not significantly different 


Formulation Estimate 
1 22.9000 A 
16.7988 B 
B 
15.4000 B 


Table 1.20 Comparison of the means of the interaction of both factors 


Т grouping of the least squares means (а =0.05) of the marca*formulation 


LS means with the same letter are not significantly different 


Brand Formulation Estimate 
1 32.3975 A 
24.4975 B 
21.3000 B 
1 1 13.4025 C 
1 9.5000 D 
1 9.1000 D 


The hypothesis test for the interaction should be tested first, and only if the 
interaction effect is not significant, should the main effects be tested. If the interac- 
tion is significant, then tests for the main effects are meaningless. The interaction 
analysis shows that brand 2 (dihydroxyacetone), in all three formulations, shows a 
greater change compared to brand 1 (erythrulose). 

Now, considering the previous model without interaction (уу =у12 =*** = уз = 0) 
where factor A has a levels and factor B has b levels, the model without interaction is 
given by: 


Уж =H + Gi + P; + Eijk 


ігі-1,2,:5,а;) = 1,2,:5,5 k = 1, 2, °°, пу and єк ~ №, o^). The model 
without interaction with a = 3, b = 2, and nj; = 3 reduces to: 
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Note that the design matrix for the model without interaction is the same as that 
for the model with interaction, except that the last six columns are removed. 

Let us assume that the interaction effect is not significant. The following SAS 
code estimates the main effects of both factors. Running the program and analysis is 
left as practice for the readers. 


proc glimmix; 

class brand formulation; 

model y= formula brand; 

lsmeans brand formulation/lines; 
run; 


14 Analysis of Covariance (ANCOVA) 


Consider an experiment to compare t > 2 treatments after adjusting for the effects of 
a covariate x. The model for an analysis of covariance is given by: 


y; =M + ti + PiXij + еу 


fori=1,2,:,tandj=1,2,:::,niHere, є; are the independent normally distributed 
random errors with a zero mean and a variance constant o? > 0. In this model, и is 
the overall mean, т; is the fixed effect of the ith treatment (ignoring the covariates 
x 5), В; denotes the slope of the line that relates the response variable y to x for the 
ith treatment, and x, are fixed covariates. Assuming 1 = 3, ոլ ոշ = т 3, 
we have: 
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The analysis of соуапапсе (АМСОУА), as сап ԵՇ seen, obeys a general linear 
model of the form y = XB +e. 

For example consider a hypothetical study of flower production in two subspecies 
of plants. The number of flowers per plant may vary between the subspecies, but, 
within each subspecies, flower production may also vary with the size of each plant, 
and this relationship may be positive or negative. A positive relationship might arise 
if plants with more resources (sunlight, water, nutrients) could invest more energy in 
both growth and flower production. A negative relationship could arise if there was a 
trade-off between the energy invested in growth and the energy invested in flower 
production. In this study, subspecies is a categorical variable and plant size is a 
continuous variable (the covariate). Measuring plant size and flower production in 
the two subspecies allows the investigation of three different questions: 


Is flower production influenced by subspecies? 
Is flower production influenced by plant size? 
Is the effect of flower production on plant size influenced by subspecies? 


Example 1. The central question in plant reproductive ecology is how hermaph- 
roditic plant species allocate resources to male and female structures. A study 
conducted to address this question counted the number of stamens (male structures 
that produce pollen) and ovules (female structures that when fertilized by a pollen 
grain will become seeds) in the flowers of “prairie larkspur" plants in two 
populations in southeastern Minnesota. The total number of flowers produced by 
each plant was also determined to assess whether plant size affected ovule produc- 
tion per flower. The dataset for this example can be found in the Appendix (Data: 
Larkspur plants). 
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An ANCOVA is appropriate for this study to test the following three null 
hypotheses from these data: 


(a) There is no difference in the average number of ovules per flower between the 
two populations (the main effect). 

(b) There is no effect of plant size on the average number of ovules per flower (the 
covariate effect). 

(c) The effect of plant size on the mean number of ovules per flower did not differ 
between the study sites (the interaction effect). 


The components of the ANCOVA model, assuming that the response variable у; 
is normally distributed, are as follows: 


Distribution: у; ~ N (uia, o?) 
Linear predictor: пу = и + z; + planta(v);5 + B, (Xj —X.) 


Link function : и, = rj; (identity) 


where y;; is the number of ovules observed in the jth plant of the ith population, и is 
the overall mean,z; is the fixed effect due to the population i, planta(z),;) is the 
random effect due to the plant j in the population i, р; is the slope of the population i, 
X. is the overall mean of the size of all plants, and X; is the plant size i in the 
population j. The ANCOVA results (sources of variation and degrees of freedom) 
are shown in Table 1.21. 

The basic syntax in GLIMMIX for analysis of covariance with different slopes is 
as follows: 


procglimmix; 

class poblacion plant; 

model ovules = population xbar population*xbar/ddfm=satterthwaite; 
random plant (population) ; 

lsmeans population/lines; 

run; 


Table 1.21 Analysis of covariance 


Sources of variation Degrees of freedom 
Population 1—122—1z1 
P (depending on the size of the plant) 1 
Population ff 1 
Error Ս 
(Xn-1) =f=1=75 
он Y n-1-79-1-78 
іі 
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Table 1.22 Results of the (a) Covariance parameter estimates 

analysis of covariance for the : 

two populations of larkspur Cov Parm Estimate Standard error 

plants Plant (population) 12.7951 2.2416 
Residual 0.9321 
(b) Type III tests of fixed effects 
Effect Num DF | Den DF | F-value | Pr > Е 
Population 1 7,32 0.0084 
Center 1 16.39 0.0001 
Center x population |1 7.81 0.0066 
(c) Least squares means of population 

Standard t- 

Population | Estimate | error DF | value | Pr > й 
Cedar.cr 20.3538 | 0.6062 33.58 | <0.0001 
St. Croix 22.7596 | 0.6502 35.00 | «0.0001 
(d) T grouping of the least squares means (a — 0.05) of 
population 
LS means with the same letter are not significantly different 
Population Estimate 
St. Croix 22.7596 А 
Cedar.cr 20.3538 B 


In the above syntax, the “class” command lists all classes or categorical variables, 
except the covariate (continuous variable), which — in this case — 15 a variable 
centered by the average of the size of all plants (xbar = (Ху -Х «փի The options 
“ddfm” and “lines” invoke proc GLIMMIX to do а degree-of-freedom correction 
using the Satterthwaite method and a comparison of the means using the LSD 
method. Part of the results is shown in Table 1.22. 

The estimates of the variance components (part (a)) due to plant and within- 


treatment variability are © lanta(poblacion) = 12.795 and 2- = MSE — 0.9321, respec- 


tively. The analysis of variance in (b) showed that there is a significant effect 
between the two populations (P = 0.0084), plant size (P = 0.0001) and plant size 
is influenced by subspecies (interaction) on the average number of ovules 
(P = 0.0066) per flower. The estimated means and their respective standard errors 
of the average number of ovules for both populations are tabulated in the “Estimate” 
column in part (c), as well as the comparison of the means in part (d). 

If in the previous model we assume that the slopes were equal (f, = >), then the 
ANCOVA reduces to: 


yy =H + ti B(Xy-X.) + ej 


The ANCOVA model, with t = 3, n, = n; = пз = 3 for this case (equality of 
slopes) reduces to: 
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The basic syntax using GLIMMIX for an analysis of covariance with equal slopes 
is as follows: 


proc glimmix; 

class poblacion plant; 

model ovules = population xbar/ddfm=satterthwaite; 
random plant (population) ; 

lsmeans population/lines; 

run; 


So far, we have exemplified the general linear model of the form y = Xf +e. In the 
following, some characteristics of a linear mixed model (LMM) will be described. 
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1.5.1 Introduction 


Linear mixed models (LMMs) are appropriate for analyzing continuous response 
variables in which the residuals are normally distributed. These types of models are 
well suited for studies of grouped datasets such as (1) students in classrooms, 
animals in herds, people grouped by municipality or geographic region, or random- 
ized block experimental designs such as batches of raw materials for an industrial 
process and (2) longitudinal or repeated measures studies, in which subjects are 
measured repeatedly over time or under different conditions. These designs occur 
in a wide variety of settings: biology, agriculture, industry, and socioeconomic 
sciences. LMMs provide researchers with powerful and flexible analytical 
tools for these types of data. 

The name linear mixed models comes from the fact that these models are linear in 
the parameters and that the covariates, or independent variables, may involve a 
combination of fixed and random effects. “Fixed effects” can be associated with 
continuous covariates, such as weight in kilograms of an animal, maize yield in tons 
per hectare, and reference test score or socioeconomic status, which will carry a 
continuous range of values, or with factors, such as gender, variety, or group 
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treatment, which are categorical. Fixed effects are unknown constant parameters 
associated with continuous covariates or levels of the categorical factors in an LMM. 
The estimation of these parameters іп LMMS is generally of intrinsic Interest because 
they indicate the relationship of the covariates with the continuous response variable. 

When the levels of a factor are drawn from a large enough sample such that each 
particular level is not of interest (e.g., classrooms, regions, herds, or clinics that are 
randomly sampled from a population), the effects associated with the levels of those 
factors can be modeled as random effects in an LMM. “Random effects” are 
represented by random (unobserved) variables that we generally assume to have a 
particular distribution, with normal distribution being the most common. 

Mixed models are extremely useful because they allow us to work on (address) 
two important aspects: 


1. From a statistical point of view, biological data are often structured in a way that 
does not satisfy the assumption of independence of the dataset. Examples include 
the following: 


(a) Multiple measurements of the same subject/organism 

(b) Experiments organized into spatial blocks 

(c) Observational data in which multiple investigations were conducted in dif- 
ferent locations 

(d) Synthesis of data from similar experiments that were performed by different 
researchers 


2. From a biological perspective, the processes being measured can be affected by 
multiple sources of variation, often occurring at different spatial or temporal 
scales. We are interested in using statistical methods that can model multiple 
sources of stochasticity, at multiple scales, so that we can measure the relative 
magnitude of the different sources of variation and determine which predictors 
explain variation at different scales. 


1.5.2 Mixed Models 


The matrix notation for a mixed model is highly similar to that for a fixed effects 
(systematic) model. The main difference is that, instead of using only one design 
matrix to explain the entire model in its systematic part, the matrix notation for a 
mixed model uses at least two design matrices: a design matrix X to describe the 
fixed effects in the model and a design matrix Z to describe the random effects in the 
model. The fixed effects design matrix X is constructed in the same way as a general 
linear fixed effects model (y = Xf + e ). X has a dimension of п х (p + 1), where n is 
the number of observations in the dataset and р + 1 is the number of parameters of 
fixed effects in the model to be estimated. The design matrix for the random effects 
Z is constructed in the same way as the construction of the design matrix for 
fixed effects, but now for the random effects. The Z matrix has a dimension of 
n x q, where q is the number of coefficients of random effects in the model. 
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In matrix notation, a linear mixed model can be represented as 


Sistematic random Experimental Error 
Гау т 
у= XP ++ “` (12) 


b ~ N(0,G) апае — N(0, R) 


where y is the vector of n x 1 observations, is the vector of (p + 1) x 1 fixed effects, 
b is the vector of random effects of q x 1, e is the vector of n x 1 random error terms, 
X is the design matrix of n x (p + 1) for fixed effects related to observations at fj, and 
Z is the design matrix n x q for the random effects (b) related to observations at b. 

Assuming that both b and е are uncorrelated random variables with a zero mean 
and variance—covariance matrices G апа R, respectively, we have 


E(b) =0, E(e ) =0 
Var(b) = С, Var(e ) =R 
Соу(5,е)-0 


It is not difficult to verify that Var(y) = Var(XB + Zb + =) is 
Var(y) =ZGZ' + R= V 


Matrix V is an important component when working with linear mixed models 
(LMMs) because it contains random sources of variation and also defines how such 
models differ from ordinary least squares estimation. If the model contains only 
random effects, such as a randomized complete block design (RCBD), then matrix G 
1s the first point of attention. On the other hand, for repeated measures or for spatial 
analysis, matrix R is extremely important. Assuming that the random effects (blocks) 
have a normal distribution, 


b ~ N(0,G) and Var(e ) - R 


Then, the vector of observations y will have a normal distribution, that is, 
y~N(XB, V). The same model can be written in the probability distribution form in 
two different but equivalent ways. The first is the marginal model 


y ^ N(E|y| = XB, V = ZGZ' + R) (1.3) 


In this marginal model, the mean is based only on fixed effects and the parameters 
describing the random effects appear (are contained) in the variance and covariance 
matrix V (Littell et al. 2006). In general, a structure is imposed in Р in terms of 
Var(b) — G, and, therefore, marginally, the components of y depend on the structure 
in V = ZGZ + R. 
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Тһе second model is the conditional model 
y |b ~ N(Xp + Zb, К) (1.4) 


In this conditional model, b is distributed as shown in Еа. (1.2) for this parameter. 
For LMMs, the two models are exactly the same; but if the response variable is 
modeled under a non-normal distribution, then the models are different (Stroup, 
2012) and generalized linear mixed models are required. 

The fixed effects estimator (f) is useful to obtain the best linear unbiased 
estimators (commonly known as BLUEs), whereas the estimator b is useful for 
computing the best linear unbiased predictors (commonly known as BLUPs) for the 
random effects b. The estimation of the expected value of the marginal LMM (1.3) 
allows the estimation of the BLUEs and that of the conditional LMM (1.4), the 
BLUPSs. The estimators for the BLUEs of f and ће BLUPs of b are as follows: 


B- (X^ v x) XTV y 
b-az'v-!(y-5) 


This solution is efficient when working with small datasets because, in the context 
of big data, it is computationally highly demanding since the inverse of matrix V has 
to be estimated. For this reason, it is normally used to obtain the solution of the 
BLUES of В and the BLUPs of b, also known as Henderson's mixed model 
equations, which are presented later in this chapter. 


1.5.3 Distribution of the Response Variable Conditional 
on Random Effects (yl b) 


The distribution selected by the researcher from the population under study should 
be true or a good approximation that represents the likely distribution of the response 
variable. A good representation of the population distribution of a response variable 
should not only take into account the nature of the response variable (e.g., contin- 
uous, discrete, etc.) and the shape of the distribution but should also provide a good 
model for the relationship between the mean and variance. For the distribution of the 
dataset, in this chapter, we assume that it is normally distributed with a mean и and a 
variance o? bij ~ (и, o°)} and, for the random effects, it will assume a normal 
distribution with mean 0 and constant variance о? {bj ~ (0, о?) }. 
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1.5.4 Types of Factors and Their Related Effects on LMMs 


In an LMM, there are two types of factors, namely, fixed factors that make up the 
systematic part and random factors that are the stochastic part, and their related 
effects on the dependent variable (response). In the following sections, we provide a 
brief description of these factors and their implications in the context of an LMM. 


1.5.4.1 Fixed Factors 


A fixed factor is commonly used in standard analysis of variance (ANOVA) or 
analysis of covariance (ANCOVA) models. It is defined as a categorical or classi- 
fication variable, for which the researcher has included all levels (or conditions) in 
the model that are of interest in the study. Fixed factors may include qualitative 
covariates, such as gender; classification variables implied by a sampling design, 
such as a region or a stratum, or by a study design, such as the method of treatment in 
a randomized clinical trial; and so on. The levels of a fixed factor are chosen to 
represent specific conditions so that they can be used to define contrasts (or sets of 
contrasts) of interest in the research study. 


1.5.4.3 Random Factors 


A random factor is a classification variable with levels that can be randomly sampled 
from a population with different levels of study. All possible levels of a random 
factor are not present in the dataset, but this is the intention of the researcher, i.e., to 
make inference about the entire population of levels from the selected sample of 
these factor levels. Random factors are considered in an analysis such that the 
change in the dependent variable across random factor levels can be evaluated and 
the results of the data analysis can be generalized to all random factor levels in the 
population. 


1.5.4.3 Fixed Versus Random Factors 


In contrast to fixed factor levels, random factor levels do not represent conditions 
specifically chosen to meet the objectives of the study. However, depending on the 
objectives of the study, the same factor may be considered as either a fixed factor or a 
random factor. 

Fixed effects, commonly referred to as regression coefficients or fixed effect 
parameters, describe the relationships between the dependent variable and predictor 
variables (i.e., fixed factors or continuous covariates) for an entire population of 
units of analysis or for a relatively small number of subpopulations defined by the 
levels of a fixed factor. Fixed effects may describe the contrasts or differences 
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between levels of a fixed factor (e.g., sex between males and females) іп the mean 
responses for a continuous dependent variable or may describe the effect of a 
continuous covariate on the dependent variable. Fixed effects are assumed to be 
unknown fixed quantities in an LMM and are estimated based on analysis of the data 
collected in a study. 

Random effects are random values associated with the levels of a random factor 
(or factors) in an LMM. These values, which are specific to a given level of a random 
factor, generally represent random deviations from the relationships described by 
fixed effects. For example, random effects associated with levels of a random factor 
may enter an LMM as random intercepts (random deviations for a given subject or 
group as an overall intercept) or as random coefficients (random deviations for a 
given subject or group from the total fixed effects) in the model. In contrast to fixed 
effects, random effects are represented as stochastic variables in an LMM. 


1.5.5 Nested Versus Crossed Factors and Their 
Corresponding Effects 


When a given level of one factor (random or fixed) can be measured only at a single 
level of another factor and not across multiple levels, then the levels of the first factor 
are said to be nested within the levels of the second factor. The effects of the nested 
factor on the response variable are known as nested effects. For example, suppose 
that you want to conduct a particular study at the primary level in a school zone, you 
would select schools and classrooms at random. Classroom levels (one of the 
random factors) are nested within school levels (another random factor), since 
each classroom can appear within a single school. 

When a given level of one factor (random or fixed) can be measured across 
multiple levels of another factor, one factor is said to be crossed with the other and 
the effects of these factors on the dependent variable are known as crossover effects. 


1.5.6 Estimation Methods 


Standard methods of estimation in mixed models with a normal response are 
maximum likelihood (ML) and restricted maximum likelihood (REML). The linear 
mixed effects model is as follows: 


у= ХВ t Zb-e 


The variance-covariance matrix V for a one-way analysis of variance (ANOVA) 
with a randomized block effect and with six observations is equal to: 
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+0 o 0 0 0 
o? +o soar M d о 
-— ne ур ge 0 0 b b 
V=Var(y)=ZGZ +о1= 0 0 o? o o EY 0 
0 0 0 0 oe +o, бу 
0 0 0 0 0; o? +o 


The variance оҒу is Vi; = o? + o? and the covariance between у and y», is 
Vo =V = 62. These two observations come from the same block. The covariance 


between y,, and other observations is zero. In matrix V, all possible covariances can 
be found. 


1.5.6.1 Maximum Likelihood 


The likelihood function / is a function of the observations and the model parameters. 
It gives us a measure of the probability of looking at a particular observation y, given 
a set of model parameters / and b. The likelihood function for y | b and b for a mixed 
model is given by: 


n 


КУБ) = — 5 log(2n) — ն log|R| — 10 — Xf — ZbYR- (y — Xf — Zb) 
and 


_ x 


I0)= — 5 


во = 16б] 256-2 

2 2 
where N, represents the total number of random effect levels. Therefore, the joint 
distribution of y and В is equal to: 


iy By (5) 0 — X8 — Zb) R- (y — X$ Zb) : OD 


Now, after deriving the above expression with respect to f and b and then setting 
it to zero and solving the resulting equations with respect to f and b, the maximum 
likelihood estimators are obtained: 


T2 —X'R !y—XIR 7-2 25 


olob) =Z"R ly — XTR 178 — ZR- 'Zb 
ob 
Setting them to zero and solving for f and b, we obtain the following linear mixed 
equations: 
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XTR-!x XTR-!z 21 [XTR 
ZR-'X 27:12:65: b]  \ZĪR ty 
The solution can be written as: 
A \ Ге XIR-1Z 71 ( yrg-iy 
b] 27 27 12:61 Ив 


Here, f is the vector of fixed effects parameters and b is the vector of random 
effects parameters. The information of these parameters 15 related to the two covari- 
ance matrices G and R, and it no longer depends on V as in the previous solution. 
Moreover, this solution, which is known as Henderson’s (1950) mixed model 
equations, is computationally much more efficient than the previous one given for 
the parameters (6 апа b) since it does not need to obtain the inverse of the matrix 
V = ZGZ + R. The solution to these mixed model linear equations is based on the 
assumption that we know the components of G and R, which, in practice, need to be 
estimated. Therefore, the following is a popular method for estimating the variance 
components of G and R, which is extremely versatile and powerful. 


1.5.6.2 Restricted Maximum Likelihood Estimation 


The restricted maximum likelihood method is also known as the residual maximum 
likelihood method and is extremely useful, among other things, for estimating 
variance components. This method is also based on the maximum likelihood 
method, but, instead of maximizing the likelihood function of the original data, it 
maximizes the likelihood function over a set of errors obtained by removing the 
variables from the original response to fixed effects, which are assumed to be known. 
That is, now instead of maximizing over y is maximized over Ky but to obtain the 
variance components, it is assumed that K is a matrix of constants, such that KX = 0, 
which implies that: 


E(Ky) = (КХВ + KZb + Ke) =0 
Var(Ky) = (КТУК) 


This implies that Ky is distributed over N(0, КТУК) and the likelihood of Ку is 
called the restricted maximum likelihood (REML). There are many options to 
choose K and typically К = I — X(X"X)~!X", which is the ordinary least squares 
residual operator used. Therefore, the log likelihood of Ky is equal to 


п- 1 1 2 
I(V|Ky) = — >” log(2z) — 5 1ов|КТУК|— 5 (y KT) КТУК ' (Ky) 


This log likelihood after some algebra, according to Stroup (2012), is equal to: 
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I(V|Ky) = — "Р log(2z) — ; log|V| — ն log(XTV -!X) — jr vr 


where р = rank (X) and r =y — Ху), where Вии = (XV-!X) X'V 'y 

The variance components of G and R are estimated with iterative methods such as 
the Newton-Raphson or Fisher’s scoring method, which maximizes the likelihood 
function (VI Ky) with respect to the variance components. The maximization process 
starts with starting values for the variance components to estimate G and Ё, and, with 
these values of G and К, it is possible to estimate a new, more refined version of the 
parameters В and b; then, these values are used to update the estimates of the 
variance components of the matrices G and R, and this process continues until the 
established convergence is met. 


1.5.7 One-Way Random Effects Model 


Suppose that we randomly select a possible levels from a sufficiently large set of 
levels of the factor of interest. In this case, we say that the factor is random. Random 
factors are usually categorical. Continuous covariates that cannot be measured at 
random levels are generally known as "systematic" or "fixed" effects (e.g., linear, 
quadratic, or even exponential terms). Random effects are not systematic. Let us 
assume a simple one-way model: 


уу=и+т+фар dol2ssya і-1,2,55т 


However, in this case, the treatment effects and the error term are random 
variables, 1.6., т; ~ N (0, б?) and ej-N(0, o°), respectively. The terms т; and ej 
are uncorrelated, commonly referred to as “variance components." 

There can be some confusion about the differences between noise factors and 
random factors. Noise factors can be fixed or random. 

Factors are random when we think of them as being/coming from a random 
sample of a larger population, and their effect is not systematic. It is not always clear 
when a factor is random. For example, suppose that the vice president of a chain of 
stores is interested in the effects of implementing a management policy in his stores 
and the experiment includes all five existing stores, he might consider “the store" asa 
fixed factor because the levels of the factor "store" do not come from a random 
sample. However, if the store chain has 100 stores and takes 5 stores for the 
experiment, as the company is considering rapid expansion and plans to implement 
the selected new policy at the new locations, then "store" could be considered as a 
random factor. 

In fixed effects models, the researcher's interest would focus on testing the 
equality of means of treatments (stores). This would not be appropriate, however, 
for the case in which 5 stores are randomly selected out of 100 because the 


15 Mixed Models 31 


treatments are randomly selected and we are interested in the population of treat- 
ments (stores), not In a particular store or group of stores. The appropriate hypothesis 
test for this random effect model would be 


Ho: օ:-0 vs На: с2>0 
Partitioning a standard analysis of variance from the total sum of squares still 
works; however, the form of the appropriate test statistic depends on the expected 


mean squares. In this case, the appropriate test statistic would be 


f= Mean Saquareyyeatments 
= 
Mean Squareg,,o, 


F, follows an F-distribution (Fisher-Snedecor) with degrees of freedom a — 1 in the 


numerator and № — a in the denominator, where N= У) nj. 
i=l 
Шш a completely random model, we are interested in estimating the variance 
components. o? апа с2. To do so, we use the analysis of variance method, which 


consists of equating the expected mean squares with the observed values as follows: 
^2 EV 
o + no, = Mean Squareyyeatments 

where 5? = Mean Squareg,.o, 


22 
z- Mean Squareg,,, — 6 


T 


n 


1.5.8 Analysis of Variance Model of a Randomized 
Block Design 


Consider a one-way analysis of variance model with a randomized block additive 
effect. Assume two treatments and three blocks, 


Yy =H + ti + bj + 6 
where b; ~ М(0,62) and e; ~ N(0,o2) with i = 1, 2, 3 and j = 1, 2, 3. The 
j b ij J 


random effects b; апа є; are independent and uncorrelated. In addition, treatment 
effects are assumed to be fixed. The matrix notation of this model is as follows: 


32 1 Elements of Generalized Linear Mixed Models 


Sistematic Ramdom 
Уп 1 1 0 1 0 0 еп 
Y» 1 O 1 1 0 O езі 
110 i ото " 
Ур E12 
= т + b; + 
Уә 1 0 1 0 1 0 £22 
T2 b3 
У1з 1 1 0 ` 0 0 1 ` / €13 
Уә 101 p 0 0 1 b ез 


| 
| 
| 


where b~N(0, G) and e~N(0, R). The variance-covariance matrix G for the random 
effects in this case is a diagonal matrix 3 x 3 with diagonal elements оў. Note how 
the matrix representation of this model exactly corresponds to the mixed model 
formulation. That is, 


y-—Xf--Zb--e, where b~ N(0,G) and e ~ N(0,R). 


Example An animal nutritionist is interested in comparing the effect of three diets 
on weight gain in piglets. To conduct the experiment, the nutritionist randomly 
selects 3 litters from a set of 20, each containing 3 healthy, similar-sized, recently 
weaned piglets. In each litter, three piglets are selected and each piglet is randomly 
assigned to a treatment. 


A randomized complete block design (RCBD) is a variation of the completely 
randomized design (CRD). In this design, blocks of experimental units are chosen in 
such a way that the units within the blocks are as homogeneous as possible with 
respect to each other (homogeneous) and different between blocks. In a randomized 
complete block design, generally in each block, there is one experimental unit for 
each treatment, but this does not limit having more than one experimental unit for 
each treatment in each block. 

An RCBD has two sources of variation: the factor of interest that includes the 
treatments to be studied and the “block factor" that identifies the litters used in the 
experiment. 

Assumptions in RCBD: 


1. Sampling: Blocks (litters) are independently randomly selected and treatments 
are randomly assigned to each of the experimental units within each block. 

2. Errors are normal, independent, and identically normally distributed with a zero 
mean and a constant variance д”. 


Table 1.23 lists the weight in kilograms of piglets from three different litters 
under three different diets. To make inferences about the pattern of weight gain for 
the entire population (all litters) of piglets, the litters must be considered in the model 
as а random effect. Thus, the linear mixed model describing the variability of piglet 
weight gain in this research, as a function of diets, is as follows: 
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E 1.23 4. Litter Diet! Diet2 Diet3 
(kilograms) of the three litters — 543 531 597 
of piglets 
53.6 52.4 59.7 
55.2 57.1 67.2 
Table 1.24 Analysis of vari- Sources of variation Degrees of freedom 
ance of the randomized com- SS 
: Blocks Ե-1-53-1»-2 
plete block design 
Diet 1-1-3-1Հ-2 
Error t- Db -1)-4 
Total 5-і-8 


У-ли-т--5--еҙ for i=1,2,3;7=1,2,3 


where yj; is the weight observed in the ijth piglet, is the overall mean, z; is the fixed 
effect due to ith diet, b; is the random effect due to the jth block (litter) assuming 
bj ~ N (0, as and ej; is the independent and identically distributed, approximately 
normal, observed error term with mean О and variance o, i.e., ej N(0, o^). 
Random effects, b; and ғ;, are assumed to be independent and uncorrelated. 
Table 1.24 shows an outline of the analysis of variance for this dataset. 
The SAS program to analyze this dataset is as follows: 


proc glimmix data=piglets; 

class litter diet; 

model gain=diet/ddfm=satterthwaite; 
random litter; 

lsmeans diet/lines; 

contrast “Diet1 vs Diet2" diet 1-10; 
contrast "Diet2 vs Diet3" diet 10 -1; 
run; quit; 


In the previous syntax, we can mention two commands of great importance in this 
example: (1) the “ddfm = satterthwaite" command allows to make a correction of 
the degrees of freedom, and this correction is of great importance when the number 
of experimental units (UE) is different in each one of the treatments and (2) the 
command “lines” serve to obtain the means of "Ismeans" but are grouped with 
letters, and, if these averages appear with different letters, then they reflect signifi- 
cant differences. 

The output for this code is shown in Table 1.25. Subsection (a) of this table shows 
the estimated variance due to litter cm =5.3117) and the mean squared error 
(2 = 3.2961). The analysis of variance, part (b), shows that there is a highly 
significant effect of diet on piglet weight gain (P = 0.0091). In the results (part c), 
we also observe the estimated means and its standard errors (obtained with “Ismeans 
diet/lines”) and the grouping of means that are statistically different (part d). In these 
last results, we can observe that the weight gain of piglets under treatments I and II 
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Table 1.25 Results of the (a) Covariance parameter estimates 
analysis of ушш оше Соу Parm Estimate Standard error 
three different diets tested оп ——ə7 v n m —  — —— n o [° nh 
piglet weight gain Litter 5.3117 6.4573 
Residual 3.2961 2.3307 
(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr>F 
Da | Төш ասալ 
(c) Dietary least squares means 
Diet | Estimate | Standard error | DF t-value | Pr > Ifl 
I 54.3667 | 1.6939 3.406 |3210 | <0.0001 
I 54.2000 | 1.6939 3.406 | 32.00 | <0.0001 
Ш 62.2000 | 1.6939 3.406 |3672 | <0.0001 
(d) T grouping of the dietary least squares means (a = 0.05) 
LS means with the same letter are not significantly different 
Diet Estimate 
Ш 62.2000 А 
І 54.3667 В 
П 54.2000 В 
Table 1.26 Analysis of Type Ш tests of fixed effects 
СО Effect Num DF Den DF F-value Pr> F 
Diet 7.28 0.0248 


are not statistically different from each other, but they are statistically different with 
respect to treatment III. 

Since the researcher wishes to make an inference about the entire population of 
litters, the factor "litter" must be entered as a random effect; otherwise, the ability of 
the F-test to detect differences between treatments is diminished because the P-value 
changes from 0.0091 to 0.0248. Another way to see the importance of including 
random effects in an ANOVA is to calculate the relative efficiency (RE) between the 
two models. 

Table 1.26 shows the results of the analysis of variance under a completely 
randomized design (CRD), i.e., y; = и + litter; + ej is as follows: 

In this case, if the experiment had been analyzed under a CRD, then the relative 
efficiency (RE) between an RCBD and a CRD would be: 


SSB +SCE 
քբ CMEcep _ Dam ` (b — 1) М5Ввсвр + b(t — 1)СМЕвсвр 
СМЕвсвь СМЕвсвь (bt — 1)CMErcpp 


where CMEpca is the mean squared error under а CRD, СМЕвсвь is the mean 
squared error under an RCBD, 55Врвса is the sum of squares due to blocks in an 
RCBD, SSEppca is the sum of squares of errors in an RCBD, МӚВрвса is the mean 
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Table 1.27 Fit statistics of a բլ statistics CRD RCBD 

CRD and RCBD —2 Res log likelihood 33.24 31.01 
AIC (smaller is better) 41.24 35.01 
AICC (smaller is better) 81.24 39.01 
BIC (smaller is better) 40.41 33.20 
CAIC (smaller is better) 44.41 35.20 
HQIC (smaller is better) 37.90 31.38 
Pearson's chi-square 51.65 19.78 
Pearson's chi-square / DF 8.61 3.30 


square due to blocks, and ¢ and b are the number of treatments and blocks, 
respectively. If blocks are not useful, then the RE would be equal to 1. The higher 
the RE, the more effective the blocking is in reducing the error variance. This value 
can be interpreted as the relationship 7/,, where r is the number of experimental units 
that would have to be assigned to each treatment if a CRD were used instead of 
an RCBD. 

In Table 1.27, we can observe the mean squared error (MSE) of a CRD and 
RCBD (Pearson's chi-square / DF) obtained with the GLIMMIX procedure in SAS 
as well as a series of fit statistics. 

The MSE for a CRD and an RCBD are 8.61 and 3.3, respectively. Substituting 
these values into the above equation, we obtain 


` CMEcgp _ 8.61 


= СМЕрһсвр 3.3 


= 2.609. 


This value indicates that, ап RCBD is 2.609 times тоге efficient Шап а CRD. ш 
other words, this implies that it should have taken, at least, 8 (2.609 x 3 = 8) more 
experimental units x treatment units in a CRD to obtain the same MSE as that 
obtained in an RCBD. 


1.6 Exercises 


Exercise 1.6.1 The following dataset corresponds to the growth of pea plants, in 
eye units, in tissue culture with auxins ( 0.114 mm). The purpose of this experiment 
was to test the effects of the addition of various types of sugars to the culture medium 
on growth in length. Pea plants were randomly assigned to one of five treatments: 
control (no sugar), 2 % of glucose, 2 % of fructose, 1 % of glucose + 1 % of fructose, 
and 2% sucrose. A total of 10 observations were taken in each of the treatments, 
assuming that the measurements are approximately normally distributed with con- 
stant variance. Here, the individual plants to which the treatments were applied are 
the experimental units. The data from this experiment are shown below (Table 1.28): 
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Table 1.28 Growth of pea plants in the culture medium with auxins with different types of sugars 


Plant | Control 2% Glucose 2% Fructose 1% Glucose +1% fructose 2% Sucrose 
1 75 57 58 58 62 
2 67 58 61 59 66 
3 70 60 56 58 65 
4 75 59 58 61 63 
5 65 62 57 57 64 
6 71 60 56 56 62 
7 67 60 61 58 65 
8 67 57 60 57 62 
9 76 59 57 57 62 
10 68 61 58 59 67 


Table 1.29 Growth (height Fertilizer 
1 timeters) of the two 
ы sm three Sone H p Ez 
types of fertilizers plus 21 32 22.5 28 
а control 19.5 30.5 26 27.5 
Species А 22.5 25 28 31 
21.5 27.5 27 29.5 
20.5 28 26.5 30 
21 28.6 25.2 29.2 
23.7 30.1 30.6 36.1 
Species B 23.8 28.9 30.6 36.1 
23.8 30.9 28.1 38.7 
23.7 34.4 34.9 37.1 
22.8 32.7 30.1 36.8 
24.4 32.7 25.5 37.1 


(a) Write the statistical model that best describes this dataset, indicating its 
components. 

(b) Calculate the analysis of variance for this experiment. 

(c) Is there any significant difference between treatments on average plant growth? 


Exercise 1.6.2 A forage company wants to test three different types of fertilizers 
(F1, F2, and ЕЗ) for the production of two forage species (A and B) for cattle and 
compare them with a fertilizer they usually apply, which we will call control. For 
this, he decides to use 48 pots with 6 replications in the greenhouse to test the 
combinations of fertilizers and forage species. The data from this experiment are 
shown in Table 1.29: 
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(a) Write and describe the statistical model of the experimental design with all its 
components. 

(b) Calculate the analysis of variance for this experiment. 

(c) Ís there any significant difference between treatments on average plant growth? 


Exercise 1.6.3 Тһе data іп this experiment аге the number of plants regrown after 
grazing with sheep—goats. The initial size of the plant at the top of its rootstock is 
recorded, and the weight of seeds (g) that it produces at the end of the season is the 
response or dependent variable. The data for this experiment are as follows 
(Table 1.30): 


(a) List and describe all the components of the linear mixed model. 
(b) Calculate the ANOVA for this dataset and answer the following questions: 


Is seed weight influenced by the type of grazing? 
Is seed weight influenced by the plant size? 
Is the effect of grazing type on plant size influenced by the initial plant size? 


Exercise 1.6.4 Ап experiment was conducted to study the effect of supplementation 
of weaned lambs on health and growth rate when exposed to helminthiasis. A total of 
16 Dorper (breed 1) and 16 Red Maasai (breed 2) lambs were treated with an 
anthelmintic at 3 months of age (after weaning) and randomly allocated into 
“blocks” of 4 per breed, classified on the basis of 3-month body weight for 
supplemented and unsupplemented groups. Therefore, two lambs in each block 
were randomly allocated to supplemented (night-fed cotton seed meal and wheat 
bran) and unsupplemented groups. All lambs were kept on grazing for a further 
3 months. Data recorded included the initial body weight (kilograms) at weaning and 
weight at 3 months after weaning, percentage red blood cell volume (RBCV), and 
fecal egg count (FEC) at 6 months of age. Data from this experiment are shown 
below (Table 1.31): 


(a) List and describe all the components of the linear mixed model. 
(b) Calculate the ANOVA for this dataset and answer the following questions: 


Did supplementation improve weight gain? Did supplementation affect PRBC 
and FEC, and were there differences in weight gain, PRBC, or FEC between breeds? 
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Table 1.30 Fruit production size Fruit Grazing 

after grazing 6.225 59.77 No Grazing 
6.487 60.98 No Grazing 
4.919 14.73 No Grazing 
5.13 19.28 Хо Grazing 
5.417 34.25 Хо Grazing 
5.359 35.53 Хо Grazing 
7.614 87.73 No Grazing 
6.352 63.21 No Grazing 
4.975 24.25 No Grazing 
6.93 64.34 No Grazing 
6.248 52.92 No Grazing 
5.451 32.35 No Grazing 
6.013 53.61 No Grazing 
5.928 54.86 No Grazing 
6.264 64.81 No Grazing 
7.181 73.24 No Grazing 
7.001 80.64 No Grazing 
4.426 18.89 No Grazing 
7.302 75.49 No Grazing 
5.836 46.73 No Grazing 
10.253 116.05 Grazing 
6.958 38.94 Grazing 
8.001 60.77 Grazing 
9.039 84.37 Grazing 
8.91 70.11 Grazing 
6.106 14.95 Grazing 
7.691 70.7 Grazing 
8.988 80.31 Grazing 
8.975 82.35 Grazing 
9.844 105.07 Grazing 
8.508 73.79 Grazing 
7.354 50.08 Grazing 
8.643 78.28 Grazing 
7.916 41.48 Grazing 
9.351 98.47 Grazing 
7.066 40.15 Grazing 
8.158 52.26 Grazing 
7.382 46.64 Grazing 
8.515 71.01 Grazing 
8.53 83.03 Grazing 
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Table 1.31 Supplementation trial in Dorper (breed 1) and Red Maasai (breed 2) lambs 
Id Race Sex Supplement Block IW PRBC FEC WG 
349 1 2 1 1 8 6500 0.9 
326 1 2 1 1 9 2650 1.1 
393 1 1 1 2 12 750 0.6 
71 1 1 1 2 12.3 5200 2.3 
271 1 1 1 3 13 4800 0.7 
382 1 2 1 3 15.5 2450 1.3 
85 1 2 1 4 16.3 200 1.9 
176 1 2 1 4 15.9 3000 1.8 
286 1 2 2 1 11 1600 2.6 
183 1 1 2. 1 9.9 450 1.8 
21 1 2 2. 2 11.6 2900 1.5 
122 1 1 2 2 12.5 300 2.3 
374 1 1 2. 3 14.6 2250 3.3 
32 1 2 2 3 14.2 2800 2.7 
282 1 2 2 4 16.3 750 3.9 
94 1 1 2 4 16.7 5600 1 
127 2 2 1 1 7:5 1350 0.6 
216 2 2 1 1 8.2 1150 141 
133 2 1 1 2 10.1 200 1.6 
249 2 1 1 2 8.8 0 1.6 
123 2 2 1 3 1.6 600 1 
222 2 2 1 3 11.3 13.5 24 1500 2:2 
290 2 2 1 4 12.3 14.3 22 1950 2 
148 2 1 1 4 13.1 14.9 26 500 1.8 
142 2 2 2 1 8.2 11.5 25 850 3.3 
154 2 2 2 1 9.5 12.2 35 700 3.7 
166 2 1 2 2 9.7 12.8 29 400 3.1 
322 2 1 2 2 8.6 12 26 800 3.4 
156 2 1 2 3 10.2 13 28 1550 2.8 
161 2: 2 2 3 11.2 14.6 22 550 3.4 
321 2 1 2 4 12.1 15.9 25 1250 3.8 
324 2 1 2 4 13.8 18.1 24 1100 43 


IW initial weight, FW final weight, PRBC percentage of red blood cells, ҒЕС fecal egg count, WG 


weight gain 
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Appendix 
Population Plant |Stamens | Eggs Total no. of flowers | Ratio (stamens/ovules) 
St. Croix 1 30.75 13.75 8 2.24 
St. Croix 2 33.83 16.17 12 2.09 
St. Croix 3 35.67 16.33 6 2.18 
St. Croix 4 35.40 17.440 | 14 2.03 
St. Croix 5 33.50 23.50 |13 1.43 
St. Croix 6 37.40 18.40 | 10 2.03 
St. Croix մ 33.57 21.20 | 25 1.58 
St. Croix 8 29.86 2871 |20 1.04 
St. Croix 9 33.80 29.60 |17 1.14 
St. Croix 10 31.60 25.80 14 1.22 
St. Croix 11 32.57 27.50 |21 1.18 
St. Croix 12 31.80 24.00 |13 1.33 
St. Croix 13 35.25 17.75 8 1.99 
St. Croix 14 30.00 16.83 13 1.78 
St. Croix 15 30.50 18.75 9 1.63 
St. Croix 16 32.20 21.40 | 13 1.50 
St. Croix 17 32.40 26.25 12 1.23 
St. Croix 18 38.50 17.75 8 2.17 
St. Croix 19 37.00 25.83 16 1.43 
St. Croix 20 33.00 25.25 8 1.31 
St. Croix 21 31.40 25.20 |15 1.25 
St. Сгоіх 22 31.80 25.60 14 1.24 
St. Сгоіх 23 30.40 19.20 |15 1.58 
St. Croix 24 35.20 2240 |22 1.57 
St. Croix 25 27.80 20.80 10 1.34 
St. Croix 26 31.29 22.71 14 1.38 
St. Croix 27 32.83 22.33 | 20 1.47 
St. Croix 29 31.20 1740 14 1.79 
St. Сгоіх 30 33.00 19.20 | 13 1.72 
St. Croix 31 33.80 2220 |13 1.52 
St. Croix 32 32.22 27.63 |31 1.17 
St. Croix 33 32.91 28.73 18 1.15 
St. Croix 34 34.50 15.75 9 2.19 
St. Croix 35 28.33 17.33 8 1.63 
St. Croix 36 30.71 23.14 | 14 1.33 
St. Croix 37 33.00 24.00 14 1.38 
St. Croix 38 31.00 20.50 4 1.51 
St. Croix 39 35.00 21.83 15 1.60 


(continued) 
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Population Plant |Stamens | Eggs Total no. of flowers | Ratio (stamens/ovules) 
St. Croix 40 35.00 18.00 10 1.94 
Cedar Creek 1 30.17 18.67 16 1.62 
Cedar Creek |2 32.43 15.14 123 2.14 
Cedar Creek |3 28.00 14.00 15 2.00 
Cedar Creek |4 29.22 16.89 35 1.73 
Cedar Creek |5 36.00 17.14 | 20 2.10 
Cedar Creek |6 30.83 20.17 15 1.53 
Cedar Creek |7 31.75 18.00 18 1.76 
Cedar Creek |8 29.25 19.00 8 1.54 
Cedar Creek |9 32.78 2444 |24 1.34 
Cedar Creek 10 32.67 22.83 17 1.43 
Cedar Creek 11 31.43 21.00 28 1.50 
Cedar Creek 15 33.50 29.50 4 1.14 
Cedar Creek 16 32.83 15.17 20 2.16 
Cedar Creek 17 35.00 15.00 9 2.33 
Cedar Creek 18 33.17 13.83 15 2.40 
Cedar Creek 19 33.29 2714 128 1.23 
Cedar Creek |20 35.50 19.83 16 1.79 
Cedar Creek |21 35.71 1886 |21 1.89 
Cedar Creek |23 31.38 25.63 5 1.22 
Cedar Creek |25 28.25 17.50 11 1.61 
Cedar Creek |27 31.82 24.91 37 1.28 
Cedar Creek |28 35.13 26.88 23 1.31 
Cedar Creek |32 33.75 21.63 26 1.56 
Cedar Creek |33 32.00 20.80 14 1.54 
Cedar Creek |34 36.29 17.00 18 2.13 
Cedar Creek |35 28.60 16.40 11 1.74 
Cedar Creek |36 33.00 20.80 14 1.59 
Cedar Creek |37 34.90 25.11 49 1.39 
Cedar Creek |38 34.80 19.60 18 1.78 
Cedar Creek |40 30.00 21.17 16 1.42 
Cedar Creek |41 34.50 20.50 16 1.68 
Cedar Creek |42 37.75 29.00 18 1.30 
Cedar Creek |43 33.50 20.75 10 1.61 
Cedar Creek |44 33.00 22.40 12 1.47 
Cedar Creek |45 35.50 21.50 16 1.65 
Cedar Creek |46 32.50 22.00 14 1.48 
Cedar Creek |47 32.67 16.67 8 1.96 
Cedar Creek |48 35.75 21.50 |26 1.66 
Cedar Creek |49 31.38 22.88 22 1.37 
Cedar Creek |50 33.83 20.50 17 1.65 


Data: Larkspur plants from two populations іп the state of Minnesota 
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Сһар{ег 2 A 
Generalized Linear Models се 


21 Introduction 


In the generalized linear model (GLM) (which is not highly general) у = ХВ + є, the 
response variables are normally distributed, with constant variance across the values 
of all the predictor variables, and are linear functions of the predictor variables. 
Transformations of data are used to try to force the data into a normal linear 
regression model or to find a non-normal-type response variable transformation 
(discrete, categorical, positive continuous scale, etc.) that is linearly related to the 
predictor variables; however, this is no longer necessary. Instead of using a normal 
distribution, a positively skewed distribution with values that are positive real 
numbers can be selected. Generalized linear models (GLMs) go beyond linear 
mixed models, taking into account that the response variables are not of continuous 
scale (not normally distributed), GLMs are heteroscedastic, and there is a linear 
relationship between the mean of the response variable and the predictor or explan- 
atory variables. 

Nelder and Wedderburn (1972) implemented a unified methodology for linear 
models, thus opening a window for researchers to design models that can explain the 
variation of the phenomenon under study. Later, McCullagh and Nelder (1989) 
proposed an extension of linear models, called generalized linear models (GLMs). 
They pointed out that the key elements of a classical linear model are as follows: 
(i) the observations are independent, (ii) the mean of the observation is a linear 
function of some covariates, and (iii) the variance of the observation is a constant. To 
further extend these, points (ii) and (iii) are modified as follows: (17) the mean of the 
observation is associated with a linear function of some covariates via a link function 
апа (iii^) the variance of the observation is a function of the mean. For more details, 
see the study by McCullagh and Nelder (1989). GLMs can be adapted to a wide 
variety of response variables. Special cases of GLMs include not only regression and 
analysis of variance (ANOVA) but also logistic regression, probit models, Poisson 
regression, log-linear models, and many more. 
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2.2 Components of a СІМ 


The construction of а GLM begins with choosing the distribution of the response 
variable, the predictor or explanatory variables to include in the systematic compo- 
nent, and how to connect the mean of the response to the systematic component. The 
three important components are described in the following sections: 


2.2.1 Тһе Random Component 


The first component to specify is the random component, which consists of choosing 
a probability distribution for the response variable. This can be any member of the 
exponential family of distributions, such as normal, binomial, Poisson, samma, and 
so on. 


2.2.2 The Systematic Component 


The second component of a GLM is the systematic component or linear predictor, 
which consists of a linear combination of explanatory variables (the predictor). The 
systematic component of a model is the fixed structural part of the model that 
explains the systematic variability between means. The linear predictor is found on 
the right-hand side of the equation in the specification of a linear or nonlinear 
regression model. Let xi, x», `7, x, be the numerical (dummy) or discrete (category) 
predictor (explanatory) variables, then the linear predictor is 


Ni = Bo + Віхи + 8252. + °° + Вых =x] В 


where իք = (ño, Pi, Do, 7.ВА) is the vector of regression parameters 
and xT = (1, Xii, хә, Հա) is the vector of predictor variables. Although դ is a 
linear function, the x s can be nonlinear in form. For example, 7 can be a quadratic, 
cubic, or higher-order polynomial. The expected value of у; and the linear predictor 
5; are related through the link function. For example, іп a Poisson GLM, the 
predictor is equal to log(4;) =x7, since the link is a natural logarithm, better 
known as the link log. 

In normal linear regression models, the focus is on у and finding the predictors or 
explanatory variables that best explain or predict the mean of the response variable. 
This is also important in a GLM. Problems such as multicollinearity in normal linear 
regression are also problems in generalized linear models. 
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2.2.3 Predictor’s Link Function n 


Finally, we will look at the specification of the link function that maps the mean of 
the response variable to the linear predictor. The link function allows a nonlinear 
relationship between the mean of the response variable and the linear predictor, and 
this link g () connects the mean of the response variable with the linear predictor. 
That is, 


g(u) =n 


The function must be monotonous (and differentiable). The mean is equal, in turn, 
to the inverse transformation of g (), that is, 


u-g (m) 


The most natural and meaningful way to interpret the model parameters is in 
terms of the scale of the data. In other words, 


u-g (ՀՔ | (Bo + Pixi Во: + bI ЫЙ 


It is important to note that the link relates the mean of the response to the linear 
predictor and that this is different from transforming directly to the response 
variable. If the response variables are transformed (i.e., y s), then a distribution 
must be selected, which describes the population distribution of the transformed 
data, thus making the original interpretation of the data more difficult. A transfor- 
mation of the mean is generally not equal to the mean of the transformed values, that 
is, g(E[y]) Z E(g[y]). For example, suppose we have a distribution with the following 
values (and probabilities): 


yi 1 2 3 4 
prob(Y = у) 0.125 0.375 0.375 0.125 


The mean of this distribution is E[y] = 1 x 0.125 + 2 х 0.375 + 3x 0.375 + 4 x 
0.125 = 2.5. Therefore, the logarithm of the mean of this distribution is In(E[y]) 
= Іп (2.5) = 0.916, whereas the mean of the logarithm is equal to E(In[y]) = 0.845. 
The value of the linear predictor у could potentially equal any value, but the expected 
values of the response variable — as in the case of counts or proportions — can be 
bounded. If there are no restrictions on the response variable (positive or negative 
real numbers), then the "identity link" function could be used, where the mean is 
identical to the linear predictor, that is, 


=. 
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As mentioned before, the link function establishes a connection between the 
linear predictor 7 and the mean of the distribution и. It is important to note that the 
link function in some cases is in a sense similar to a function transformation, in that it 
establishes only a mathematical connection between the parameters of the model. A 
function transformation is applied to the observations to better understand the 
relationship between the mean and the response variables or, in some cases, to 
stabilize the variance. Special cases are mentioned below: 


(a) For a normal distribution, the link function is the identity function, 7 = и, the 
variance function is constant. 1.е., Var(u) = 1, and the scale parameter is the 
variance, i.e., ф = о>, allowing the use of ordinary least squares in parameter 
estimation in procedures such as linear regression, analysis of variance 
(ANOVA) models, or analysis of covariance (ANCOVA) models. 

(b) In a binomial distribution, the response variable takes binary values like 0 and 
1 or represents the relative frequency, i.e., y; = e//n;, where e; is the number of 
successes and п; is the number of trials. The mean is a probability (л) and 
therefore must be between O and 1. The linear predictor is not bounded. 
Therefore, the link function must map the real line within the interval |0,11. A 
natural link function for binomial data is the logit link: 


el 


(т) == 


Another useful alternative for these types of data is the probit link function: 
п=Ф (ал)-л-Ф(г) 


where Ф is the cumulative distribution function of a standard normal distribution. 
The variance of the function has the form Var(z) = (a/(1 — л)) and the scale 
parameter ф is known and is equal to 1 (ф = 1). The difference between the logit 
and probit estimators is important if the estimated probabilities are extremely 
small or extremely close to 1, indicating that large sample sizes are required for an 
effective inference. Both the logit and probit functions produce extremely close or 
equivalent results, especially with probability values around 0.5. 


(c) For a Poisson distribution, the link function is the natural log: 


n= log(A)> A=e" 


The variance of the function has the form Var(A) = 4, and, similar to the binomial 
distribution, the scale parameter is 1. Poisson models with a log link function are 
often referred to as log-linear models, commonly used when there are contin- 
gency (data frequency) tables with at least two entries. 
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(d) A gamma distribution has a link function of the form: 


The variance of the function is given by Уаг(и) = and the scale parameter ф is 
usually unknown. In some cases, the log link function is commonly used, which 
results in an exponential inverse link. It should be noted that the link function 
does not map the range of the means contained within the linear predictor. 
Therefore, given its limitations, the theory only provides reasonable approxima- 
tions for most applications. An exponential distribution is a special case of the 
gamma distribution. 


Previously, the classical methods for working with non-normal data — before the 
advances in computational methods — consisted of using direct transformation of the 
response variable, that is, the data were transformed using the function Ку) before 
being analyzed. The goal of the transformation was to obtain a simple connection 
between the mean and the linear predictor. However, obtaining a consistent scale of 
variation when selecting a transformation is vitally important. The usual way for 
selecting a suitable transformation is based on the assumption that, within the region 
of variation of the random variable, the transformation can capture the variability 
adequately through a simple linear approximation of the mean. That is, if the random 
variable y has a distribution with a mean и and variance о“ (и), we want to find a 
transformation Ку) such that it is forced to have a constant variance (stabilizes the 
variance). The commonly used functions to stabilize variance are the square root 
(v) when data have a Poisson distribution; the arcsine square root when data are 
binomial; and the logarithmic transformation for data with a constant coefficient of 
variation. 

Table 2.1 provides an overview of the most common link functions that will give 
admissible values for certain types of response variables and the corresponding 
inverse of the link function. 


Table 2.1 Common link functions for different response variables 


Type of response Media Variance g(u) =n g p 

Normal H о? H =n 

Poisson с 4 log(A) A= е 
Binomial ratio л л(1 — zyN (logit) log (л/1 — л) л = e + е") 

(probit) Ф (хл) x = O(n) 

Exponential " իք Тор(и) и = eo 
Gamma H (inverse)l/u и Հ 1h 
Negative binomial 4 415226 log(A) A= e 


Note: Ф is the cumulative distribution function of a standard normal distribution; и and л are the 
expected values of the response; y is the linear predictor; and Փ is the scale parameter 
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2.3 Assumptions of a СІМ 


According to McCullagh and Nelder (1989) апа Agresti (2013) in Chap. 4, a GLM is 
defined under the following assumptions: 


(a) The data y1, y2, 77, y, are independent. 

(b) The response variable y; does not necessarily have to have a normal distribution, 
but we usually assume a distribution from an exponential family (e.g., binomial, 
Poisson, multinomial, гатта, etc.). 

(с) А GLM does not assume а linear relationship between the dependent variable 
and the independent variables, but it does assume a linear relationship between 
the response transformed in terms of the link function and the explanatory 
variables; for example, for logit(z) from a binary logistic regression, logit 
(z) = 0 + fix. 

(d) The predictor (explanatory) variables may be in terms of power or some other 
nonlinear transformations of the original independent variables. 

(е) The assumption of homogeneity of variance need not be satisfied. In fact, it is not 
possible in many cases, given the structure of the model and the presence of 
overdispersion (when the observed variance is larger than what the model 
assumes). 

(f) Errors are independent but are not normally distributed. 

(g) The estimation method is maximum likelihood (ML) or other methods instead of 
ordinary least squares (OLS) to estimate the parameters. 


2.4 Estimation and Inference of a GLM 


Estimators of the regression coefficients for linear models with a normal response are 
obtained using least squares or ML, and significance tests are generally used to 
compare the sum of least squares under different hypothesis tests using the F-test. It 
is worth mentioning that these tests are exact, and, so, no approximations are 
required for their implementation. 

GLMs offer a natural extension of this situation in the sense that: (1) The 
computational calculations used to determine the ML estimations of the regression 
parameters/coefficients are highly similar to those used in cases when the response is 
normal, with the difference being that the estimation process is iterative, which 
produces successive approximations that converge to the ML estimates. (2) In the 
inference procedures, the test statistic commonly used is the likelihood ratio test, 
which is parallel to the F-tests in linear models with a normal response. Thus, GLMs 
provide a uniform method of estimation and inference. Estimation of parameter fj is 
highly similar to the ML method, whereas the inference methods are generally 
approximations since they are based on the theory of the distribution of a sufficiently 
large sample, as in the case of the likelihood ratio method. There are several 
alternative tests such as the Wald test, test scores, and the likelihood ratio test. 
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2.5 Specification of a GLM 


In the following examples, we will describe the components of a GLM for some 
normal, gamma, binomial, and Poisson regression models. 


2.5.1 Continuous Normal Response Variable 


In simple linear regression models, the expected mean value of a continuous 
response variable depends on a set of explanatory variables, as follows: 


у; = Во + Pixi t £r ei~ N (0, с?) 
Equivalently, 


E(y,|xi) = Во + P xi 
This GLM can be expressed in terms of its three components: 


Distribution: y; ~ N (u;, 0°) 
Е(у)-ш 


Var(y;) = о? 


Linear predictor: n; = Ву + Вх; 
Link function: 7; = и; (identity link) 


where ро and f, are the intercept and slope, respectively. This means that we are 
expressing the linear model as a GLM. 


Example 1 А simple linear regression analysis was performed on the diamond price 
(y) as a function of the number of carats (Table 2.2) and assuming that the response 
variable y has a normal distribution with a mean до + 3х; and variance c. 

The basic Statistical Analysis Software (SAS) syntax for simple linear regression 
is as follows: 


proc тед; 

model price=weight/clbpr; 
output out=diag p=pred r=resid; 
id weght ; 

run; 


In the above program, “proc reg” invokes a linear regression procedure in SAS. 


[TEES 


The “clb” option generates a confidence interval for the slope and intercept. The “p 
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Table 2.2 Diamond price (dollars) based on weight (carats) 


Weight | Price | Weight | Price | Weight | Price | Weight | Price | Weight | Price 


0.17 355 0.18 462 0.18 468 0.17 0.25 655 
0.16 328 0.28 823 0.16 345 0.32 918 0.35 1086 
0.17 0.16 336 0.17 352 0.32 919 0.18 443 
0.18 0.2 498 0.16 332 0.15 298 0.25 678 


0.25 642 0.23 595 0.17 353 0.16 339 0.25 675 
0.16 342 0.29 860 0.18 438 0.16 338 0.15 287 
0.15 322 0.12 223 0.17 318 0.23 595 0.26 693 
0.19 485 0.26 663 0.18 419 0.23 553 0.15 316 
0.21 483 0.25 0.17 346 0.17 345 0.43 
0.15 0.27 0.15 0.33 945 


Table 2.3 Regression analysis results 


Estimated parameters 
Effect 
Intercept 


Pr > Itl 
<0.0001 
<0.0001 


Standard error 
17.3189 
B Weight 3721.02 81.7859 
22 Scale 1013.82 |211.40 


Estimate Degree of freedom (DF) 


се? 


option generates fitted values апа standard errors. Тһе “т” option performs а residual 
analysis (1.е., checks assumptions). The “output ош” statement generates a new 
dataset called “diag” containing the residuals and the predicted/adjusted values. The 
“id weight” statement adds the specified variable to the fitted values output. 

Part of the results is shown in Table 2.3. The estimated parameters, obtained from 
“proc reg,” are shown below: 

Note that the estimated parameters are all statistically significantly different from 
zero. Then, the linear predictor takes the form: 


n= — 259.63 + 3721.02 x weight, 


If the response variable y does not fit the data well, then the normal distribution 
may barely represent the response distribution; that is, it would weakly explain the 
variability of the data and, consequently, the “identity” may not be the best link 
function, since the linear predictor would not include all the relevant information or 
some combination of the three components of the GLM. Although other fit measure 
statistics exist in the linear regression model, such as the coefficient of determination 
(R°), the residual analysis is used to determine whether there is a good fit of the 
model or whether the assumptions of a Gaussian model are met. In this example, the 
value of R? is R? = 0.9783, and this value indicates that the model used explains 
97.83% of the total variability of the dataset. In Fig. 2.1, we сап see that the simple 
linear regression model provides a good fit to this dataset. 
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Fig. 2.1 А dot plot of price vs. weight (carat) and fitted model 
2.5.2 Binary Logistic Regression 


Logistic regression and other binomial response models are widely used in research 
areas like biological sciences and agriculture. Given their importance in this section, 
some relevant features of these models are mentioned. 

Let y; be the observed response on а set of p explanatory variables xj, x», 77, Xp 
whose distribution y; is binomial with n; independent Bernoulli trials and probability 


of success z; on each trial, 1.e., 
y; ~ Binomial (n;, л) 


Then, we can model the response using a GLM with a binomial response. The 
linear predictor in this case will be equal to 


Ti 
log (=) = Po + Pim + + Хш 


1-л; 


commonly known as “logit” because logit is defined as: 


logit(z;) = log т^ 


52 2 Generalized Linear Models 


лі 


which models the logarithm of the odds ratio, (i55): asa function of the predictor 


variables. The components of this GLM for binomial data are: 


Distribution: y; ~ Binomial(n;, z;), with mean and variance 
E(y;) = тл; and Var(y;) = nizi(1 — zi) 
Linear predictor: n; = Во + Вух ^ + Вых 


Link function : И, = logit(z;) = log lius (logit link) 
այո 


Another highly useful link function — when you have experiments — is the 
“probit” link у; = Ф `!(zp, which was mentioned before. 

The basic GLM for this dataset, under the probit link, is almost identical to the 
logit link as seen below: 


Distribution: у; ~ Binomial(n;, л) 
Linear predictor: n; = Во + fixi + ^ + Вых» 


Link function : դ, = probit(z;) = @ `! [z;]. 


Example 1 An engineer is interested in studying the effect of temperature (Temp) 
from 0 to 40 °C and time in days from 0 to 15 days on the germination of seeds of a 
certain crop. For this reason, he placed seeds in different pots containing moist soil. 
After a certain number of days, the number of germinated seeds was counted. If the 
seeds germinated, then y = 1; otherwise, у = 0. The probability of germination z;; 
can be modeled through 


n; = Во + BP, Day; + P; Temp; 


where y; is the linear predictor and ро, /1, and / are the parameters to be estimated. 
In this GLM, the link function is 


zu айы алы | т 
пу = logit (zi) ve 


and the probability in the interval (0, 1) is computed through the inverse of the link 
function 
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This last expression allows to estimate the probability of germination (лу) under 
different temperature conditions (°С) and time periods (days). Note that the 
nonlinear relationship between the result z;; and the linear predictor q; is modeled 
by the inverse of the link function. In this particular case, the link function is the 
logit. 


пу = log fero] = g(ni) 


1-лу 


For the illustration of this example, а set of data was simulated using Ше values 
Во = 8, В, = — 0.19, and f; = — 0.37 in the linear predictor and the inverse of the 
linear function by varying the temperature from 0 to 40 °C and time from 0 to 
15 days, i.e., 


А 1 
№. = 
Ս 1+ е(8 — 0.19 x Temp; — 0.37 x Day;) 


Part of the simulated data is shown below: 


Temp Days Germ 

0 0 0.000335 
0 0.5 0.000403 
0 1 0.000485 
0 1.5 0.000584 
0 0.000703 
40 13 0.987991 
40 13.5 0.989999 
40 14 0.991674 
40 14.5 0.99307 
40 15 0.994234 


The following commands allow us to perform a binomial regression using the 


ээ сс 


“logit, 


probit,” and linear regression with the “identity” link. It is important to 


mention that we denote temperature as f and days as d in the codes used below. 


Logit Regression 
proc glimmix data=germ; 


model p= t d/solution dist=binomial link=logit cl; 


output out=logitout pred (noblup ilink)=predicted resid=residual; 


run; 
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Probit Regression 

ркос glimmix data=germ; 

model p = t d/solution dist=binomial link=probit cl; 

output out=probitout pred (noblup ilink)=predicted resid=residual; 
run; 


Linear Regression (Identity) 

proc glimmix data=germ ; 

model р = t d/solution cl dist=normal; 

output out=identity out pred (noblup ilink)=predicted resid=residual; 
run; 


“proc GLIMMIX” in SAS uses complex models without modifying the response 
variable as occurs when a direct transformation is applied to the response variable. 
Instead, GLIMMIX uses a link function of the response variable that is modeled as 
having a linear relationship with the explanatory variables. The “model” command 
specifies the response variable p as a function of the explanatory variables t and а, 
which define Xf. The "solution" option in the model specification invokes the 
regression procedure to list the fixed effects parameter estimates of the model 
(Во, P1. and f;). The “dist” option is used to specify the distribution of the response 
variable, and the "link" option is used to specify the link function. 

To get predicted probability values for each observation, the “output” option in 
proc GLIMMIX is used. Two types of predicted values can be obtained with the 
"output" option. The first type is the solution for the random effects (best linear 
unbiased predictors (BLUPs)) in the linearized model, and the second type is the 
predictions based on the fixed effects (best linear unbiased estimators (BLUEs)) 
(pred(noblup ilink) = predicted). The “ilink” sub-option in the “pred” option asks for 
the inverse function of the predicted values, that is, the probabilities of the pre- 
dictions that are stored under the predicted file name. Finally, the “resid” option is 
used to request the residuals of the regression, which are stored in the residual. 

Table 2.4 shows part of the output (analysis of variance (part (a)) and estimation 
and significance of fixed effects (part (b)) of the regression procedure using the logit 
link function. 


кик 2.4 2... (а) Туре III tests of fixed effects 
о BSF 
T 1 2508 551.28 <0.0001 
D 1 2508 407.19 <0.0001 
(b) Parameters estimates 
Standard 
Effect Estimate | error DF t-value Pr > || 
Intercept | —8.0000 | 0.3189 2508 | —25.08 <0.0001 
T 0.1900 | 0.008092 | 2508 23.48 <0.0001 
D 0.3700 | 0.01834 |2508 20.18 <0.0001 
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Table 2.5 Parameter estimates, linear predictor, and probability of linear, logit, and probit models 


Link function Parameter Estimated value 7] T 


ռ 0.106 
В, 0.207 


«Тһе linear predictor 7 and the probability 7 were estimated using D = 15 and T = 30 


a P occ b 


Percentage Percentage 


1.00 - En 100 T^ 
07517 


0.50 |^ 


2667 7-. 


“-< 


Temperature 1333 ^ 


Fig. 2.2 (а, b) Probability of seed germination as a function of temperature and day 


In Table 2.5, parameter estimates of the linear predictor for the generalized linear, 
logit, and probit models are presented. The probabilities estimated by the probit and 
logit models are almost identical to each other, but those of the linear probability 
model are different; this is because the data were generated with a binomial distri- 
bution, whereas the estimated linear predictor differs substantially from the linear 
predictor under the link probit and logit. 

In Fig. 2.2a, b, we observe that in an interval between 3 and 7 days and 0 and 
15 °C, there is approximately 20% seed germination, but, while both factors 
increase, the germination percentage also increases substantially. 


2.5.2.1 Model Diagnosis 


For a linear model, a plot of the predicted values against the residuals is probably the 
simplest way to decide whether the model used provides a good fit to the data; but, 
for a GLM, we must decide on the appropriate scale to use for the fitted values. 
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Fig. 2.3 Predicted vs. residual values using the logit link 


Generally, it is better to use linear predictors 7 in the plot rather than the predicted 
responses ji. If there is no linear relationship between the linear predictors and the 
residuals, then it could indicate a lack of fit in the model. For a linear model, we 
could perform a transformation of the response variable, but this is not highly 
recommended for a GLM as this could change the response distribution. Another 
alternative would be to change the link function, but since there are not many link 
functions that allow interpreting a model easily, this is not a good option. Moreover, 
changing the linear predictor or transforming the predictor variables would not be the 
best way to go. 

Figures 2.3, 2.4, and 2.5 show the linear predictor versus residual (we can also see 
the predicted value versus the residual). By investigating the nature of the relation- 
ship between the predictors and the residuals in Fig. 2.3, we can see that there is a 
linear relationship between the predictor and the residual, using the logit function, 
whereas the probit and identity functions do not show this linear relationship. 
However, with the probit link function, we observe a curvilinear relationship 
between the predictor and the residual, which may be because homogeneity of 
variance is not satisfied under this link function. Therefore, the logit link is shown 
to be the best choice. 


Example 2 Fruit flies can be a year-round problem in fruit-growing areas in many 
regions of the world, such as in Mexico, and are most common especially in late 
summer and fall because ripe or fermented fruits and vegetables attract insects by 
serving as a natural host. If these insects are not controlled, economic losses in fruit- 
growing areas could be large and devastating to the producers. In response to this, 
entomologists have implemented experiments to help mitigate the damages caused 
by these insects. One such experiment attempted to establish the relationship 
between the concentration of a toxic agent (nicotine) for 5 hours and the number 
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Fig. 2.4 Predicted values vs. residuals using the probit link 
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Fig. 2.5 Predicted vs. residual values using the identity link 


of insects killed (common fruit fly); the data are shown in Table 2.6, and, for more 
information, see the study by Myers et al. (2002). 


The number of dead insects can be modeled under a binomial distribution (n, z). 
Let y; denote the number of dead insects at a concentration i. The GLM components 
for this dataset are: 
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Table 2.6 Ratio of the concentration of a toxic agent to the number of fruit flies killed 


Concentration (g/100 cc) | Number of insects (n) | Dead insects (y) | Proportion of dead insects 
0.1 47 8 0.17 

0.15 53 14 0.264 

0.20 55 24 0.436 

0.30 52 32 0.615 

0.50 46 38 0.826 

0.70 54 50 0.926 

0.95 52 50 0.962 


Distribution: у; — Binomial(n;, z;), with mean and variance 
: E(yj) = nizi and Var(y;) = niz;(1 — ո) 


Linear predictor: n; = Ро + f, conc; 


Link function : դ, = logit(z;) = log т] (logit link) 


(i 


Note that we are using conc; to denote the independent variable nicotine toxicant 
concentration. The following SAS code allows us to perform a binomial regression 
for the fruit fly dataset: 


fly data; 
input concny; 
datalines; 
0.1478 
0.155314 
0.255 24 
0.3 52 32 
0.5 46 38 
0.75450 
0.95 52 50 


proc glimmix data=nobound fly; 
model y/n = conc/dist=binomial link=logit solution; 
run; 


The above syntax produces the following output: 

The analysis of variance (Table 2.7 a) shows that there is a highly significant 
effect of nicotine concentration on the number of flies killed (P = 0.0004). From the 
results obtained, we can observe that, in part (b), the maximum likelihood estimator 
for the intercept and slope are - = — 1.7361 and В, = 6.2954, respectively, which 
are used to construct the linear predictor: 


N; = — 1.7361 + 6.2954 x conc; 
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Table 2.7 Results of the (a) Type III tests of fixed effects 

ы ХАН withthe Effect Num DF Den DF F-value Рг> F 
Conc 71.94 0.0004 
(b) Parameter estimates 
Effect Estimate | Standard error | DF | t-value | Pr > Id 
Intercept | —1.7361 | 0.2420 5 —7.17 | 0.0008 
Conc 6.2954 | 0.7422 5 8.48 0.0004 
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Fig. 2.6 Proportion of dead insects as a function of nicotine concentration 


Therefore, with the logistic regression model, we can estimate the probability that 
an insect dies when exposed to a certain concentration i of nicotine using the 
following expression: 


14 en 


e 1.73614-6.2954 x conc; 


T(conc;) = = 1 + е—17361+6.2954сопс; 


A plot of Ше mean proportion of dead insects exposed to а certain concentration 
of nicotine and the regression curve (linear, quadratic, and cubic) is shown in 
Fig. 2.6. In this figure, we observe that as the nicotine concentration increases, the 
mean proportion of dead insects increases. The best linear predictor is of a quadratic 
order. 


2.5.3 Poisson Regression 


Often, the outcome of a variable is numerical in the form of counts. Sometimes it is a 
count of rare events such as, for example, (1) the number of plants infected by a 
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certain disease in a population over a period of time, (2) the number of insects 
surviving after the application of an insecticide over time, (3) the number of dead fish 
found per cubic kilometer due to a certain pollutant, (4) the number of sick animals 
occurring in a given month in a given country, and so on. The Poisson probability 
distribution is perhaps the most widely used for modeling count-type response 
variables. As 4 (the average count) іпсгеавев, the Poisson distribution grows sym- 
metrically and eventually approaches a normal distribution. 

The Poisson likelihood function is appropriate for nonnegative Integer data and 
this process assumes that events occur randomly over time, so the following 
conditions must be met: 


(a) The probability of at least one occurrence of an event in a given time interval is 
proportional to the length of the Interval. 

(b) The probability of two or more occurrences of an event within an extremely 
small interval is negligible. 

(с) The number of occurrences of an event in disjoint time intervals are mutually 
independent. 


The probability distribution of a Poisson random variable y, ' which represents 
the number of successes occurring in a given time interval or in a given region of 
space, is given by the expression 


e ^ k 


PO=k)= J ° 


A>0, k-L2,- 
where 4 is the average number of successes (the average count) in a time or space 
interval. The mean and variance of this distribution are the same, that is, 


E(y) = Var(y) =A 


Poisson regression belongs to a GLM and is appropriate for analyzing count data 
or contingency tables. A Poisson regression assumes that the response variable “у” 
has a Poisson distribution and that the logarithm of its expected value can be 
modeled by a linear combination of unknown parameters and independent variables. 
As in a standard linear regression, the predictors, weighted by the coefficients of хі, 
X2, 77, Xp, are summed to form the linear predictor, 


P 
ni = Во + Y `x, B, 
р=1 


where fo is the intercept and f, is the slope of the covariates x, (р = 1, ++, P). Thus, 
the expected value of y; and the linear predictor у; are related through the link 
function. The components of a GLM with a Poisson response (у; ~ Poisson(4;)), 
where 4; is the expected value of y,, are as follows: 
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Fig.2.7 Students infected with the disease 


Distribution: y; ~ Poisson(4;), with E(y;) = Var(y;) = 4; 
Linear predictor: n; = Во + Bixii + 77 + PpXpi 
Link function: q; = log(4j) = g(4;) (ое link) 


Example1 The following dataset corresponds to the number of students diagnosed 
(Fig. 2.7) with a certain infectious disease within a period of days of an initial 
outbreak. We will fit a generalized linear model for “count” data assuming a Poisson 
distribution. 


Note that the response distribution is skewed to the right and that the responses 
are positive integers. Since the response variable is count, the initial choice of a 
Poisson distribution is reasonable for this dataset with its canonical link, the natural 
logarithm. The number of “days elapsed" after the initial disease outbreak is the 
predictor variable in the systematic component. Thus, the GLM for this dataset 
(Appendix: Data: Infected students) is: 


Distribution: Inffected students; — Poisson(4;) 
Linear predictor: n; = f, + f, Days 
Link function: q; = log(4;) (log link) 


Part of the data is shown below: 


Days elapsed Infected students 


1 6 
2 8 
3 12 


(continued) 
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Days elapsed Infected students 
109 1 
110 1 
112 0 


For the purposes of implementation, we use days to denote elapsed days and 
students to denote infected students. We can employ the Poisson regression model 
using GLIMMIX in SAS, as shown below: 


proc glimmix data=students method=laplace; 

mode1 students=days/solution dist=poisson link=log; 
output out=sal_infection pred (noblup ilink)=predicted 
resid=residual; 

run; 


The “proc GLIMMIX” statement invokes the SAS generalized linear mixed 
model (GLMM) procedure. The “model” command specifies the response variable 
and the predictor variable, whereas the “solution” option in the model specification 
requests a listing of the fixed effects parameter estimates. The “dist = poisson” 
option specifies the distribution of the data, and the “link = log” option declares the 
link function to be used in the model. The default estimation technique in general- 
ized linear mixed models is restricted pseudo-likelihood (the “RPSL method”); in 
this example, we use “method = laplace.” The “output” option creates a dataset 
containing predicted values and diagnostic residuals, calculated after fitting the 
model. By default, all variables in the original dataset are included in the output 
dataset, whereas the “out = sal_infection” statement specifies the name of the output 
dataset. The “pre(noblup ilink) = predicted” option calculates the predicted values 
without taking into account the random effects of the model, and “ШиК” calculates 
the statistics and predicted values at the scale of the data. Finally, the “resid = residual 
option” calculates the residuals. 

The probability estimation of a GLMM involves an integral, which, in general, 
cannot be calculated explicitly. “СІІММІХ by default, uses the RSPL method, but 
it also offers different options such as the quadrature and Laplace integration 
method, among others. These integral approximation methods approximate the 
probability function of an GLMM, and the optimization of the function is numeri- 
cally approximated. These methods provide a real objective function for optimiza- 
tion. For more details, see the SAS manual. However, in a GLM, this approximation 
involving the integral is not necessary since an exact solution can be obtained to 
estimate the parameters, as there are no random effects. The results of this analysis 
are shown below (Table 2.8). 

The fit statistics in part (a) (“Fit statistics”) give us an idea of the quality of the 
goodness of the fit of the model; these statistics are very useful when we are 
proposing different models to try and find the best model for the data. In this case, 
the value of the generalized chi-squared statistic divided over its degrees of freedom 
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Table 2.8 Results of the 


| 4 (a) Fit statistics (Akaike’s information criterion (AIC), a small 
analysis of variance 


sample bias corrected Akaike’s information criterion 
(AICC), Bozdogan Akaike’s information criterion (CAIC), 
Schwarz’s Bayesian information criterion (BIC), Hannan and 
Quinn information criterion (HQIC)) 


—2 Log likelihood 389.11 
AIC (smaller is better) 393.11 
AICC (smaller is better) 393.22 
BIC (smaller is better) 398.49 
CAIC (smaller is better) 400.49 
HQIC (smaller is better) 395.29 
Pearson’s chi-square 84.95 
Pearson’s chi-square / DF 0.78 


(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr>F 


Days || 028 aow 


(c) Parameter estimates 


Standard 
Effect Estimate | error DF | ¢t-value | Pr > || 
Intercept 0.08394 <0.0001 
Days — 0.001727 - <0.0001 
0.01746 10.11 


is close to 1. This indicates that the variability of these data has been reasonably 
modeled and that there is no residual overdispersion. The value of the generalized 
chi-squared statistic divided over its degrees of freedom (Pearson s chi — square/DF) 
is the experimental error of the analysis. 

The “Type Ш tests of fixed effects” (in part (b)) and the solution for the intercept 
and the days effect (“Parameter estimates”) in part (c) are shown in Table 2.8. The 
negative coefficient of the covariate days indicates that as the number of days 
increases, the average number of students diagnosed with the disease decreases. 

That is, we reject the null hypothesis (P = 0.0001) that the expected number of 
infected students is the same as the number of days increases. 

We see that with a 1-day increase in the infection period, the expected 
(or average) number of students diagnosed with the disease decreases by a factor 
aba = 0.9827. 

The estimated linear predictor for this GLM is: 


3j; = 1.9902 — 0.01746 x Days 


For example, we can calculate the probability of diagnosing "k = 2" infected 
students іп a period of 2 days; i.e., Days = 2 as follows: 


E exp C) (%) 


P(Y;=k) = я 


[eJ 
к 


2 Generalized Linear Models 


on 
= - 
© 
79 
Յ со 
o 
Փ © 
Ф 
о + 
9 
Փ сч 
T 
Z 
0 20 40 60 80 100 
Días 


Fig. 2.8 Infected students and a Poisson regression fit 


exp Ը expll.9902 — 0.01462) (exp[1.9902 — 0.0146 x 2])? 


P(Y;=2)= 5 


(- ехр(1.961)) 2 
_ exp ea — 0.0207 


This value indicates that the probability of observing/diagnosing two students 
with the disease in a 2-day period is 0.0207 (2.070190). 

In Fig. 2.8, we observe that the Poisson model is a good candidate for modeling 
this dataset, since there is no overdispersion in this regression model. 


Example 2 A forest engineer is interested in modeling the number of trees recently 
infected by a certain virus. The data that he has are age (years), height (meters) of the 
trees, and the number of infected trees. Using a linear model could result in negative 
values of the parameter 4, which would not make sense. The link function g(4) for a 
Poisson error structure is the logarithm. Therefore, the GLM, defining y; — infected 
trees;, can be as follows: 


Distribution: y; ~ Poisson(4;) 
Linear predictor: n; = Jo + д, x Age; + f; x height; 
Link function: q; = log(A;) = g(4;) (log link) 


For this example, a dataset was simulated using the following parameter values: 
Во = —2, В: = — 0.03, and f; = — 0.04. In addition, in order to obtain the linear 
predictor, the variable age (years) varied from 0 to 50 and height (meters) from 0 to 
30, both with increments in one unit. Thus, the values of уу; were simulated with the 
following expression: 
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Кір. 2.9 (а,Ь) Probability of tree infection as а function of tree height апа age іп years 


y; E exp' — 2 — 0.03 x Age; — 0.04 x Height) 


In Fig 2.9a, b, we can see that at a young age, between 1 and 10 years and at a 
height of no more than 10 meters, trees are more susceptible to be infested by the 
virus. However, as their age increases, trees show greater resistance. 

The following SAS code fits a Poisson regression model with two predictor 
variables, assuming that there is no interaction between the two explanatory 
variables. 


proc glimmix data=infection method=laplace; 

model infection=age height /solution dist=poisson link=log; 
output out=sal_infection pred (noblup ilink)=predicted 
resid=residual; 

run; 


In Table 2.9 part (a), the analysis of variance shows that age and tree height are 
highly significant, indicating that both variables help explain the infection mecha- 
nism of the trees through a Poisson model (P < 0.0001). 

The linear predictor for this GLM, with Poisson distribution, in the response 
variable is: 


n; = — 2 — 0.03 х Аре; — 0.04 x Height, 


The estimated values of the parameters of each of the explanatory variables 
indicate that as age (years) and height (meters) increase by one unit, the tree is less 
susceptible to the virus. If we want to calculate the probability of diagnosing k-3 
infected trees with the virus when they are 2 years old and 3-meters tall, we can use 
the following equation: 
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Table 2.9 Part of Ше results of the analysis of variance under a Poisson distribution 
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(a) Type III tests of fixed effects 


Effect Num DF Den DF F-value Pr> F 
Age 1 6158 43.20 <0.0001 
Height 1 6158 29.10 <0.0001 
(b) Parameter estimates 
Effect Estimate Standard error DF t-value Pr > й 
Intercept —2.0000 0.1388 6158 —14.41 <0.0001 
Age —0.03000 0.004564 6158 —6.57 <0.0001 
Height —0.04000 0.007415 6158 —5.39 <0.0001 
mU 
^ exp C D (2) 
P(Y,= k) = ü 
Р(Ү:=3) 
` exp( exp 1—2 —0.03 x Age — 0.04 x Height) (exp | — 2 — 0.03 x Age — 0.04 x Height])* 
E 3! 
exp( 9P[72-003x2—904*3] (exp [— 2 — 0.03 x 2 — 0.04 x 3 


= 31 = 0.000215 


This value indicates that the probability of observing/diagnosing three trees 
with the virus causing the disease when they are 2 years old and 3-meters tall 
is 0.000215 (0.0215%). 

A Poisson regression model, sometimes referred to as a log-linear model, is 
especially useful when it is used in contingency table modeling. Log-linear models 
are models of associations between variables in a contingency table; they treat 
variables symmetrically and do not distinguish one variable as a response. They 
have a formal structure of double or more entries that can be fitted by binomial or 
Poisson regression. These models for contingency tables have several specific 
applications in biological and social sciences. 

Variables can be nominal or ordinal. A nominal variable has no natural order; for 
example, gender (male, female, transgender), eye color (blue, brown, green), and 
type of pet (cat, bird, fish, dog, mouse). An ordinal variable has a range of orders; for 
example, when you want to measure the degree of consumer satisfaction with the 
consumption of a product (very dissatisfied, somewhat dissatisfied, neither satisfied 
nor dissatisfied, somewhat satisfied, very satisfied). 


2.5 Specification of а GLM 67 
2.5.4 Gamma Regression 


A gamma distribution is a distribution that occurs naturally in processes for which 
waiting times, between events, are relevant. Lifetime data are sometimes modeled 
with a gamma distribution. This distribution can take a wide range of forms due to 
the relationship between the mean and variance across its two parameters (о and р) 
and is suitable for dealing with heteroscedasticity of nonnegative data. The proba- 
bility of observing a particular value y, given the parameters о and f, is 


= 1 օ-17-0/Թ) 
ay е VIET ya, p> 0 
70) L'(a)f 


where I (-) is the gamma function. A gamma regression uses the input variables X's 
and coefficients to make a prediction about the mean of y, but it actually focuses 


more of its attention on the scale parameter р. The mean and variance of a Gamma 
random variable are: 


Е(У) =аВ=и and Var(Y) = af? = и? /a 


The probability density function gamma can Бе rewritten in terms of the mean 
и and the scale parameter a as follows: 


Е (2) ap C9. jut 


и 


Plotting Ше гатта distribution (Fig. 2.10) with three different values of shape 
a = (0.75, 1, and 2), the scale parameter и has a multiplicative effect. In the gamma 
density of the first panel a = 0.75, we see that the density is infinite at 0, whereas in 
the second panel a = 1, it corresponds to the exponential density, and, in the third 
panel a = 2, we see a skewed distribution. 
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Fig. 2.10 Gamma density: from left to right, а = 0.75, 1, and 2 
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А gamma distribution can arise in different forms. The sum of ‘п independent 
and identically distributed exponential random variables with parameter f has a 
gamma distribution (n, 0). The chi-squared distribution Z isa special case of a 
gamma distribution with / = 1/2 and a/2 degrees of freedom. 

Theoretically, a Gamma distribution should be the best choice when the response 
variable has a real value in the range of zero to infinity and it is appropriate when a 
fixed relationship between the mean and variance is suspected. If we expect the 
values y to be small, then we should expect a small amount of variability in the 
observed values. Conversely, if we expect large values of `y, ` then we should expect 
(observe) a lot of variability. 

Models with a gamma distribution with multiplicative covariate effects provide 
additional support for modeling nonnegative right-skewed continuous responses, 
such as the gamma variable with the log link function. Whether the data are modeled 
with an inverse or logarithmic link function will depend on whether the rate of 
change or the logarithm of the rate of change is a more meaningful measure. For 
example, in studies of yield density that commonly assume that yield per plant is 
inversely proportional to plant density (Shinozaki and Kira 1956), the linear 
predictor is: 


ni = (Во + fixi) W; 


Example 1 In the development of coagulation agents, it is common to perform 
in vitro clotting time studies. The following data were reported by McCullagh and 
Nelder (1989). Plasma samples from healthy men were diluted to nine different 
percentages of prothrombin-free plasma concentration; the greater the dilution, the 
more interference with the ability of the blood to clot because the natural clotting 
ability of the blood has been weakened. For each sample, clotting was induced by 
introducing thromboplastin, a clotting agent, and the time until clotting occurred 
(in seconds) was recorded. Five samples were measured at each of the nine concen- 
tration percentages, and the mean clotting times were averaged; therefore, the 
response is the mean clotting time across the five samples. In Fig. 2.11, the response 
variable is plotted against the percentage thromboplastin concentration in which we 
observe that the longer clotting times tend to be more variable than the smaller 
clotting times, so a linear regression model may not be appropriate. 


In this analysis, we will model clotting times as the response variable (y;) with 
plasma concentration percentage as the predictor variable. Conc denotes the inde- 
pendent variable concentration. The GLM for this dataset is: 


Distribution: y; = Clotting time; ~ Gamma(a, 8) 
Linear predictor: у; = Во + f, x conc; 


1 


Link function : H; = Bo + B * conc; 
1 


(inverse link) 
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Fig. 2.11 Clotting time (seconds), depending on the thromboplastin concentration 


The following syntax allows us to adjust a GLM with gamma errors in 
GLIMMIX: 


data coagu; 

input num conc y; 

datalines; 

15118 

21058 

315 42 

42035 

53027 

64025 

76021 

8 80 19 

9100 18 

proc glimmix data = coagu; 

model у = conc / dist=gamma link=power (-1) solution; 
output out=salgamml pred (noblup ilink)=predicted resid=residual; 
run; 


Most of the syntax has already been described in the previous examples; the only 
new опе is the link = power (—1) option. This statement invokes the inverse 
link function in the GLIMMIX procedure. 

Some of the output from this analysis is shown in Table 2.10. 

The dilution percentage, part (a) in Table 2.10, of the blood plasma concentration 
significantly affects the clotting time (P = 0.0004). The values for constructing the 
fitted linear predictor are tabulated in part (b) of Table 2.10. 


N; = 0.008686 + 0.000658 х conc; 
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Table 2.10 Results of the (a) Type III tests of fixed effects 

d алын, analysis undera Effect Num DF Den DF F-value Pr> F 

gamma distribution 
Cone 1 | |4101 0.0004 
(b) Parameter estimates 
Effect Estimate | Standard error | DF | t-value | Pr > И 
Intercept | 0.008686 | 0.002294 3.79 0.0068 
Conc 0.000658 | 0.000103 6.40 0.0004 
Scale 0.05213 | 0.02436 


With the parameterization of the gamma distribution, previously chosen, the 
intercept and the beta coefficient corresponding to the concentration variable were 
calculated through GLIMMIX in SAS, as well as the scale parameter(a), which in 
the SAS output corresponds to the scale. With part of this information, it is possible 
to calculate the mean (E[Y] = и) and variance (Var[Y] = wla) for a concentration 
conc = 10 as follows: 


RA 1 1 
Ք 0.008686 + 0.000658 x conc 0.008686 + 0.000658 x 10 


= 65.505 


НС 22 2 
Var(y) = Е = 65-505 — 85818.215 


The average time it takes for blood to clot — when a thromboplastin concentration 
of 1096 is added — is 65.505 seconds with a variance of 85818.215. 


2.5.4.1 Model Selection 


Selecting a model from a set of candidate models that provides the best fit and largely 
explains the variability in the data is a necessary but complex task. This process 
involves trying to minimize information loss. From the field of information theory, 
several information criteria have been proposed to quantify information, or the 
expected value of information, and, among these, the most widely used are the 
Akaike information criterion (AIC) (Akaike 1973, 1974) and the Bayesian informa- 
tion criterion (BIC) (Schwarz 1978). Both AIC and BIC are based on the ML 
estimates of the model parameters. In a regression fit, the estimates of fj s under 
the ordinary least method and the ML method are identical. The difference between 
the two methods comes from estimating the common variance c? of the normal 
distribution of the errors, around the true mean. 
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Table 2.11 Goodness-of-fit 
metrics for each of the three 

models and regression analy- 
sis results for model 3 


71 
(а) Fit statistics Model 1 Model2 | Model 3 
—2 Log likelihood 62.15 44.49 27.47 
AIC (smaller is better) 68.15 52.49 37.47 
AICC (smaller is better) 72.95 62.49 57.47 
BIC (smaller is better) 68.74 53.27 38.45 
CAIC (smaller is better) 71.74 57.27 43.45 
HQIC (smaller is better) 66.87 50.78 35.34 
Реагвоп 5 chi-square 0.50 0.07 0.01 
Pearson’s chi-square / DF 0.07 0.01 0.001 
(b) Type III tests of fixed effects 

F- 
Effect Num DF | Den DF | value Pr>F 
Conc 1 5 476.73 | <0.0001 
Conc x conc 1 5 110.78 0.0001 
conc x conc x conc |1 3 50.92. 0.0008 
(c) Parameter estimates 
Standard t- 
Effect Estimate | error DF | value | Pr> й 
Intercept - 0.000576 |5 - 0.5177 
0.00040 0.70 
Сопс 0.001946 | 0.000089 |5 21.83 | «0.0001 
сопс х сопс | — 2.576Е-6 |5 = 0.0001 
0.00003 10.53 

сопс х сопс |1.337E-7 | 2.520e- 5 5.306 | <0.0001 
x conc 08 
Scale 0.001125 | 0.000530 


To get an idea of how to use these adjustment statistics, let us compare three 
possible models that best explain the effect of the plasma dilution percentage: 


Model 2: դ, = By + В, x conc; + fj; x conc? 


Model 1: q; = Во + Ву х conc; 


Model 3: у; = Po + В, х conc; + f; х conc? + Вз x conc; 


Since the proposed models have a gamma error structure, the commonly used fit 
statistic (R?) in a simple linear regression model is not reported. Part of the results of 
this analysis is shown below with various metrics as goodness-of-fit measures: 

With regard to the values of the goodness-of-fit metrics (Table 2.11 part (a)), the 
smaller they are, the better the fit. Based on the above, the accuracy of the fit of the 
three regression models increased as the polynomial in the linear predictor increased. 
That is, model three best explained the variability of the plasma clotting time. The 
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Fig. 2.12 Fitting the gamma regression model with three predictors 


type Ш sum of squares for fixed effects and the estimated parameters under model 
three are tabulated in parts (b) and (c) in Table 2.11, respectively. 

Parameter estimates under the linear predictor with linear, quadratic, and cubic 
effects are highly significant. The results suggest that a cubic effect for the percent- 
age dilution in plasma concentration in the linear predictor is more efficient in 
explaining the clotting time than taking only a linear predictor with only linear or 
both linear and quadratic effects (Fig. 2.12). 


2.5.5 Beta Regression 


Studies in various areas of knowledge, including agriculture, often face the need to 
explain a variable expressed as a proportion, percentage, rate, or fraction in the 
continuous range (0,1). In economics, for example, the factors that influence the 
proportion of households that do not have a cement floor have been studied. 
Similarly, in plant breeding, it is desired to investigate the factors that influence 
the proportion of plant leaves damaged by a certain disease. In parallel, the propor- 
tion of impurities in chemical compounds is of everyday interest in physics and 
chemistry. While studies on electoral preferences analyze citizen participation rates 
and the variables that can explain them, in the field of education and academic 
performance, we try to explain the proportion of success in standardized tests. 
Moreover, it is also used to identify the factors associated with the proportion of 
credit used by credit card users. The public health field has also been confronted with 
the need to model the proportion of coverage in health programs in order to identify 
the sociodemographic and economic characteristics associated with whether a 
woman is covered. Johnson et al. (1995) presented the properties of the probability 
distribution of this type of variable; these researchers showed that the beta distribu- 
tion can be used to model proportions, since its density can take different forms 
depending on the values of the two shape parameters that index the distribution. 
However, the beta regression that results from using the beta distribution as a 
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response variable in the context of generalized linear models is not very well known, 
but its use is increasing every day, thanks to friendly software that allow its 
implementation in an extremely easy manner. 


Definition Let y be a continuous random variable defined in the interval [0, 1] and 
a, p > 0. Then, Y has a beta distribution with parameters of forms a and f if and 
only if: 


1 a p E 
fy) Бау” (=y, б<у< 
where B(a, В) is the beta function defined as B(a, В) = we and Г is the gamma 


function. The mean and variance of this probability density function are given by 


а 


а--р 


Е(Ү)= апа Var(Y) = ы zs 
(a+ B + 1)(a + р) 

In the context of regression analysis, the density of the beta distribution provided 
above is not very useful for modeling the mean of the response. Therefore, this 
density is reparametrized so that it contains a precision (or dispersion) parameter. 
This reparameterization consists of defining a y = ЖЕЙ and ф = a + P, i.e., а = иф 


and f = (1 — и)ф, which means that: 


Е(у) = и 


апа 


So, и is the mean of the response variable and ф can be interpreted as a parameter 
of precision in the sense that, for a fixed и, the higher the value of Փ, the smaller the 
variance of y. The density function of y can be written as: 


z = Г(ф) фа _уї-ш#ф-1 
f.) = rs թ. (1-у) 0<y<l 


where 0 < и < 1 and $ > 0. 

Let y1, y», ... , y, be independent and identically distributed random variables, 
where each y; with i = 1, 2, ... , n is modeled under the parametrized beta model 
with a mean и and an unknown parameter ф. The model is obtained by assuming that 
the mean of y; can be written as: 
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Table 2.12 Proportion of Concentration ү 
fruit damage (y) as a function ——  --- 
| : 0.1 0.08 

of concentration. Percentage is 

equal to proportion x100 0.25 0.09 
0.5 0.11 
1 0.2 
2 0.3 
4 0.53 
5 0.63 
8 0.71 
10 0.73 
25 0.84 
50 0.85 
100 0.86 


k 
(uj) = ЭЕ; =: 


i=l 


where 21, f», ..., Pk are unknown regression parameters and x; are the k covariates 
(k < n) that are fixed and known. Finally, g(:) is a strictly monotone and differen- 
tiable link function that maps to the real numbers in the interval (0, 1). 

There are several possible options for the link function g(-). For example, we can 


use a logit link function g(u) = log (т) ‚ which is considered the most popular and 


asymptotically efficient, but it is also feasible to use the probit g(u) = Ф (и) 
function, where ®(-) is the cumulative distribution function of a standard normal 
random variable, and the complementary link function 2(и) = log (— log (1-и)), 
among others (McCullagh and Nelder 1989). 


Example 1 The objective of this experiment was to evaluate the effect of the 
concentration of a chemical compound on the proportion of damage ( y) in the fruits 
(Table 2.12). This compound is known to inhibit the growth of an insect, but, at a 
certain concentration, it can cause damage to the fruits. 


The proportion of damage to the fruits can be modeled under a beta distribution 
(и, ф). Let y; be the proportion of damage to the fruits due to the ith concentration. 
The GLM components for this dataset are as follows: 


Distribution: y; ~ beta(u;, ф), with E(y) 24. and Var(y) = ipu 


Linear predictor: n; = Ву + f, х conc; 


Link function : 7; = logit(z;) = log կար (logit log) 
— m 
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bars т 2. (a) Fit statistics Linear Quadratic 

dratic models and resalts cf — 2 Log likelihood -6.23 —14.50 

the quadratic model fit AIC (smaller is better) —0.23 —6.50 
AICC (smaller is better) 2.77 —0.79 
BIC (smaller is better) 1.22 —4.56 
CAIC (smaller is better) 4.22 —0.56 
HQIC (smaller is better) —0.77 —7.22 
Pearson’s chi-square 12.85 15.05 
Pearson’s chi-square / DF 1.07 1.25 


(b) Type III tests of fixed effects 


Effet ЕЙ 


Сопс 1 8.08 0.0193 
Сопс х сопс 1 6.11 0.0354 
(c) Parameter estimates 

Standard t- 
Effect Estimate error DF value |Рг> й 
Intercept —1.1425 | 0.2935 -3.89 | 0.0037 
Сопс 0.1572 |0.05530 2.84 | 0.0193 
Сопс х —0.00132 | 0.000534 —2.47 | 0.0354 
сопс 
Scale 9.0432 (4.0045 


Note that we are using conc to denote the independent variable concentration of 
the chemical compound. The following SAS code allows us to perform a beta 
regression for the dataset: 


ркос glimmix method=laplace; 
mode1 y = conc / dist=beta s; 
run; 


The “method = Laplace” statement asks SAS for the estimation method to be 
Laplace integration, and the “dist = beta” and “s” options invoke GLIMMIX to 
perform beta regression and provide fixed parameter estimation, respectively. 

In order to see which type of linear, quadratic, or cubic predictor best explains the 
observed variability іп a dataset, we make use of the fit statistics (—2 log likelihood, 
AIC, etc.). Part of the output is shown below in Table 2.13. According to the fit 
Statistics in part (a), the predictor that best models this experiment is the quadratic 
predictor. 

In Fig. 2.13, we can see that the best linear predictor to model a dataset is of the 
cubic order, but due to the indeterminacy (not showing here) in the t-value (infinity), 
in the hypothesis test of the estimated parameters, it was decided to take the 
quadratic predictor. Both predictors, quadratic and cubic, better model the propor- 
tion (percentage — proportionx100) of fruit damage caused by the concentration of 
the applied chemical. 
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Fig. 2.13 Fitting the beta regression model 


2.6 Exercises 


Exercise 2.6.1 The partial dataset corresponds to an evaluation of the effects of 
increasing application rates of picloram (0, 1.1, 2.2, and 4.5 kg/ha) for the control of 
larkspur plants (data in Table 2.14). The objective of this study was to study the 
efficacy of picloram herbicide in controlling larkspur plants. 


(a) List and describe the components of the GLM (distribution, systematic compo- 
nent (predictor), and the link function). 

(b) Fit the model according to part (a). 

(c) Interpret your results. 


Exercise 2.6.2 Effect of pH, Brix, temperature, and nisin concentration on the 
growth of Alicyclobacillus acidoterrestris CRA7152 in apple juice. The objective 
of this experiment was to model the presence/absence of CRA7152 growth in apple 
juice as a function of pH (3.5—5.5), Brix (11—19), temperature (25—50 ?C), and nisin 
concentration (0—70). The data are shown below (Table 2.15): 


(a) List and describe the components of the GLM (distribution, systematic compo- 
nent (predictor), and the link function). 

(b) Fit the model according to part (a). 

(c) Interpret your findings. 


Exercise 2.6.3 The objective of this experiment was to evaluate the level of toxicity 
of concentrations of pyrethrin and piperonyl butoxide on the mortality of beetles 
(Tribolium castaneum). Pyrethrin is a natural insecticide found in the plant Chry- 
santhemum cinerariaefolium and its flowers. The active ingredients are pyrethrins I 
and П, cinerins I and П, and jasmolins I and II. Тһе dried flowers contain 0.9-1.3% 
pyrethrum. The crude extract contains 50-60% pyrethrum and is imported from 
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Table 2.14 Toxicity of picloram in controlling larkspur plants 


Y 


Conc 


Rep 


1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 


Y 


Conc 


1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 


Кер 


y 


Conc 


1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 
1.1 


Rep 


(continued) 
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Table 2.14 (continued) 


2 Generalized Linear Models 


Rep Conc y Rep Conc y Rep Conc Y 
1 22 0 1.1 1 1.1 0 
1 2.2 1 1.1 1 1.1 0 
1 2:2 1 1.1 1 1.1 0 
1 212, 1 1.1 1 1.1 0 
1 22 1 141 1 1.1 0 
1 22 1 141 1 1.1 0 
1 2:2: 1 1.1 1 1.1 0 
1 2.2 1 1.1 1 1.1 0 
1 2.2; 1 2.2 0 1.1 0 
1 22 1 2.2. 0 1.1 1 
1 2.2 1 2:2 0 1.1 1 
1 222; 1 22 0 1.1 1 
1 2.2: 1 2.2 0 1.1 1 
1 2:2; 1 22 0 141 1 
1 22 1 22 1 1.1 1 
1 22 1 2.2 1 1.1 1 
1 22 1 22 1 1.1 1 
1 22 1 22 1 1.1 1 
1 22 1 2.2 1 22 0 
1 22 1 22 1 2:2 0 
1 22 1 2.2 1 22 1 
1 2.2; 1 2.2 1 222 1 
1 22 1 2.2 1 22 1 
1 22 1 2.2 1 22 1 
1 2:2; 1 2.2 1 22 1 
1 22 1 2.2 1 2.2 1 
1 2:2; 1 22 1 2:2 1 
1 2.2 1 2.2 1 2.2 1 
1 2:2 1 2.2 1 22 1 
1 4.5 1 2.2 1 22 1 
1 4.5 1 2.2 1 2.2 1 
1 4.5 1 22 1 2.2 1 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 2:2 1 
1 4.5 1 4.5 1 2:2; 1 
1 4.5 1 4.5 1 2:2 1 
1 4.5 1 4.5 1 2:2 1 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 2:2 1 


(continued) 
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Table 2.14 (continued) 


Rep Conc Y Rep Conc Y Rep Conc Y 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 22 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
1 4.5 1 4.5 1 4.5 1 
4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 4.5 1 

4.5 1 

4.5 1 

4.5 1 

4.5 1 

4.5 1 

4.5 1 

4.5 1 

4.5 1 


Кер replicate, Сопс concentration, Y =1 dead/ Y = 0 alive 


various countries. The extract is diluted to 20%, which is the maximum concentra- 
tion commercially available in the United States. Pyrethrin oxidizes on exposure to 
air but has been shown to be stable for long periods in water-based emulsions and oil 
concentrates. Synergistic compounds (such as piperonyl butoxide or N-octyl 
bicycloheptene dicarboximide), which enhance the effect of pyrethrin on insects, 
are present in commercially available pyrethrin formulations. The results of this 
study are shown below (Table 2.16). 


(a) List and describe the components of the GLM (distribution, systematic compo- 
nent (predictor), and the link function). 

(b) Fit the model according to part (a). 

(c) Interpret your results. 
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Table 2.15 Growth of Alicyclobacillus acidoterrestris CRA7152 


Generalized Linear Models 


pH Nisin Temp (°С) Brix y pH Nisin Temp (°C) Brix Y 
5.5 70 50 11 0 5.5 70 50 19 0 
5.5 70 43 19 0 3.3 0 25 11 0 
5.5 50 43 13 1 5.5 70 50 11 0 
5.5 50 35 15 1 5.5 70 43 19 0 
5.5 30 35 13 1 5.5 50 43 13 1 
5.5 30 25 11 0 5.5 50 35 15 1 
5.5 0 50 19 0 5.5 30 35 13 1 
5.5 0 25 15 1 5.5 30 25 11 0 
3.5 70 43 15 0 5.5 0 50 19 0 
3.5 70 35 11 0 5.5 0 25 15 1 
3.5 50 50 13 0 3.5 70 43 15 0 
3.5 50 35 19 0 3.5 70 35 11 0 
3.5 30 50 11 0 3.5 50 50 13 0 
3.5 30 43 15 0 3.5 50 35 19 0 
3.5 0 25 19 0 3.5 30 50 11 0 
5 70 25 15 0 3.5 30 43 15 0 
5 70 25 13 0 3:5 0 25 19 0 
5 50 50 15 1 5 70 25 15 0 
5 50 25 19 0 5 70 25 13 0 
5 30 43 19 0 5 50 50 15 1 
5 30 43 11 1 5 50 25 19 0 
5 0 50 13 1 5 30 43 19 0 
5 0 35 11 1 5 30 43 11 1 
4 70 50 19 0 5 0 50 13 1 
4 70 35 13 0 5 0 35 11 1 
4 50 43 11 0 4 70 50 19 0 
4 50 25 11 0 4 70 35 13 0 
4 30 50 15 1 4 50 43 11 0 
4 30 35 19 0 4 50 25 11 0 
4 30 25 13 0 4 30 50 15 1 
4 0 43 15 1 4 30 35 19 0 
4 0 43 13 1 4 30 25 13 0 
3.5 0 35 11 0 4 0 43 15 1 
4 0 35 11 1 4 0 43 13 1 
5 0 43 11 1 3.5 0 35 11 0 

4 0 35 11 1 

5 0 43 11 1 

5.5 70 50 19 0 

3.5 0 25 11 0 
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Table 2.16 Mixture: pyre- Mixur п ү 
Е 150 E: 
and Y is number of beetles 1.06 шш Е 
killed 0.75 150 22 
135 151 129 
1.03 151 65 
0.8 150 19 
33 149 143 
3.07 150 ШЕ 
29 140 37 
10.65 150 1a 
10.46 150 EM 
10.32 149 zi 
0 200 1 


Table 2.17 Results of Ше experiment with carbon disulfide 


Dose Number of exposed beetles Number of dead beetles Proportion of dead beetles 
49.1 59 6 0.102 

33 60 13 0.217 

56.9 62 18 0.29 

60.8 56 28 0.5 

64.8 63 52 0.825 

68.7 59 53 0.898 

72.6 62 61 0.984 

76.5 60 61 1 


Exercise 2.6.4 Тһе objective of this experiment was to model the probability of 
mortality of the toxic effect of carbon disulfide (ԸՏշ) gas on beetles. The insects 
were exposed to various concentrations of this gas (їп mf/L) for 5 hours (Bliss 1935), 
and, then the number of dead beetles (Y) was counted. The data are shown below 
(Table 2.17). 


(a) List and describe the components of the GLM (distribution, systematic compo- 
nent (predictor), and the link function). 

(b) Fit the model according to part (a). 

(c) Interpret your results. 


Exercise 2.6.5 A study was conducted to assess the fowlpox virus in chorioallantois 
by the Pock counting technique. The membrane Pock count for 50 embryos exposed 
to one of four dilutions of virus (multiples of 107(-3.86)). The FD column heading 
corresponds to the dilution factor and the number of Pocks observed (Table 2.18). 


82 


Table 2.18 Results of Ше fowl pox experiment 


2 Generalized Linear Models 


FD Count FD Count FD Count FD Count 
0.125 1 0.25 5 0.5 5 1 12 
0.125 2 0.25 2 0.5 11 1 9 
0.125 2 0.25 3 0.5 7 1 11 
0.125 3 0.25 2 0.5 3 | 17 
0.125 2 0.25 5 0.5 4 1 11 
0.125 2 0.25 0 0.5 6 | 10 
0.125 | 0.25 2 0.5 5 1 8 
0.125 0 0.25 2 0.5 9 1 16 
0.125 0 0.25 0 0.5 4 1 15 
0.125 1 0.25 3 0.5 7 1 12 
0.125 2 0.25 2 0.5 4 
0.125 1 0.25 2 0.5 8 
0.125 1 0.5 4 
0.125 2 
0.125 1 
Table 2.19 Number of Quinoline dosage (ug/placa) 
reversed Salmonella ТА98 0 10 33 100 333 1000 
colonies 
15 16 16 27 33 20 
21 18 26 41 38 27 
19 21 33 60 41 42 


(a) List and describe the components of the GLM (distribution, systematic compo- 


nent (predictor), and the link function). 


(b) Fit the model according to part (a). 
(c) Interpret your findings. 


Exercise 2.6.6 Data were provided by Margolin et al. (1981) from an Ames 
Salmonella reverse mutagenicity assay. The table shows the number of reversed 
colonies observed on each of the three plates (repeats) tested at each of the six 
quinoline dose levels. The focus is on testing for mutagenic effects over time in the 


excess variation typically observed between counts (Table 2.19). 


(a) List and describe the components of the GLM (distribution, systematic compo- 


nent (predictor), and the link function). 


(b) Fit the model according to part a). 
(c) Interpret your results. 
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Сһар{ег 3 (Я) 
Objectives of Inference for Stochastic шуны 
Models 


Throughout this book, we have been using the pseudonym GLMMs to denote 
generalized linear mixed models. The common denominator among all these models 
is that they all contain a linear model (LM) part, which refers to the fixed effects 
component of the linear predictor Xf. In а GLMM, the prefix “G” indicates that the 
distribution of observations may not be normal, the suffix of the first M means that 
the linear predictor includes mixed effects and thus contains random effects, which 
are expressed by the term “Zb.” The fixed linear component of the predictor Хр is 
important because the fixed effects describe the treatment design, which, in turn, is 
determined by the objectives or the initial research questions that the study wishes to 
answer. Therefore, if the researcher proposes using a reasonable model to analyze an 
experiment, then he/she must be able to express each objective as a question about a 
model parameter or as a linear combination of model parameters. 


Example Assume a factorial 2 x 2 model, with two levels in both factors A and В, 
in which all possible combinations are tested. In this case, ХВ corresponds to a 
two-way model with interaction and a predictor given by 


Nj =H ai TH (а8),51ј= 1,2 


As in all the statistical models studied so far, the linear predictor is expressed in 
terms of the link function, and у; can estimate the mean и; (a combination of 
treatments) directly if the data follow a normal distribution and indirectly if the 
data are not normally distributed. For this example, the inference should focus on 
one or more of the following options (estimable functions): a treatment combination 
mean; a main effect mean; the mean of factor A, which is the average of the overall 
levels of factor B or vice versa; the difference of the main effects or the difference of 
a single effect, i.e., the difference between two levels of factor A at a given level of B 
or the difference between two levels of factor B at a given level of factor A; and so 
on. Each of these options can be expressed in terms of the parameters of the linear 
predictor, as shown in Table 3.1. 
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Table 3.1 Estimable functions іп a factorial 2 x 2 treatment structure using the identity link 
function 


Target estimation in 
terms of the expected 


Target estimate Parameter estimator of the linear predictor value 
Combination A x В n+ a; + В; + (ap); Hij 
Main effect of factor A տ =n + a; - 3 fj; + ip); В. = ЭЭ 
j j 7 
Main effect of factor В q =n + гуа; + B; + iof); B; = БЭЛ, 
i i i 


Difference between 
levellandlevel2of Hi — Ho. = Qj — 92 + 5 
factor A 


Difference between | 


Hi, — Ha, 
> (af) jj Ens 


j j 


level 1 andlevel20f Պ.-12Հ-: 


factor B 
Simple effect of А Nu — Noy = ор — 05 + (AB); — (AP); Hij — poj 
given B; (Al Bj) 
Simple effect of B "ni — ño = (of) — (оўо + Bi — f» Hi — Ио 
given A (BIA;) 
Interaction between Mu — 1121) — (то — 122) = (08) — (о а Ин — И12 - Moi + H22 
factors A and B — (afi; + (@P)22= (ті — M12) — (Noi — N22) 
(A x В) = AIB, -2: AIB; 
=В | А — В | А» 


> (ap) — > (ой) 


i i 


+В – В 


Assuming that the data have а normal distribution, which is equivalent to using ап 
identity link function, the estimator, in terms of the linear predictor (column 2), 
estimates the expected values of column three. If the data do not follow a normal 
distribution, then column 2 indirectly estimates the expected values of column three, 
and, in order to estimate the expected values, link functions are required. For link 
functions other than identity, the estimates in column two require a more careful 
handling. In an experimental design with a factorial treatment structure, the analysis 
should focus on the interaction of the two factors. If this interaction is significant, 
then the simple effects are not equal; however, if the Interaction is not significant, 
then the main effects provide useful information; otherwise, the main effects are 
confounded. Therefore, for this reason, in this case, it is better to focus on the simple 
effects. 


3.1 Three Aspects to Consider for an Inference 


When constructing a model, the researcher must decide whether the effects are fixed 
or random. This decision has important implications with respect to the estimation 
criteria and in the interpretation of the tests and estimates obtained. Given these 
implications, three important aspects, described in the following sections, must be 
taken into consideration in statistical modeling. 
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3.1.1 Data Scale in the Modeling Process Versus 
Original Data 


This 18 a very particular issue for models with a link function other than “identity,” 
since the scale of the data used in the modeling process is not always the same as the 
scale of the original data when the assumption of normality in the response variable 
is no longer valid. When the data are normally distributed, the estimable function 
directly estimates the expected value. However, this is not true if the data follow a 
non-normal distribution. For example, in a logistic model for binomial data in a 
completely randomized design, the estimable function у + z; estimates a logit or 
“log” odds. In this vein, 7 + т; must be expressed as a probability and not as a logit, 
i.e., the expected value for individuals receiving the ith treatment is a probability. 
This requires converting the estimate to a probability, using the inverse link; that is: 


1 
Ai (1 (Ті e= (п+)) 
Thus, for functions other than “identity,” there аге two ways of expressing the 
estimates: (1) in terms of the parameters directly estimated from the GLMM (model 
scale) or (2) in terms of the expected value of the response variable (data scale). 


3.1.2 Inference Space 


This problem arises only when the linear predictor contains random effects. In these 
models, the estimates are obtained through a linear combination (an estimable 
function) with fixed effects, even though the linear predictor contains random 
effects. кр denotes Ше estimable function, where К is Ше matrix of order 
[(р + 1) x k] and / is the vector of fixed effects parameters of order [(p + 1) x 1]. 
The estimable function (КВ) represents а broad inference as it generalizes results to 
the entire population represented by the random effects. 

Although the linear combination кр +Zbisa predictable function with Z,a 
matrix for random effects with nonzero coefficients, its inference is limited to only 
those levels defined in b. Suppose that you are conducting an experiment with three 
treatments at different locations (L), then the estimable function 7, — т provides 
information for the inference about the difference between treatments 1 and 2 in the 
whole population under study. Although the predictable function |ті — T2 + (1); — 
(Іл) constrains the inference space between treatments 1 and 2, it is limited to 
location (1). The type of inference produced by predictable functions is called 
"narrow inference" because the nonzero coefficients in matrix Z reduce the scope 
of inference for the entire population at those levels identified in Z. Thus, the 
predictive function Kf + Zb should be used for specific estimates, whereas the 
estimable function Kf should be used for valid estimates for the entire population 
under study. 
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3.1.3 Inference Based on Marginal and Conditional Models 


Аз mentioned in the previous chapter, the specification of a generalized linear mixed 
model (GLMM) is done іп terms of two probability distributions: (1) the distribution 
of the observations, given the random effects y | b and (2) the distribution of the 
random effects b. This feature is very particular to Gaussian (and non-Gaussian) 
mixed models (MMs), for this reason, it is also valid for mixed models with response 
variables that are different from a normal distribution. 

From the probability theory, the marginal probability distribution of data (y) can 
be obtained by integrating over the random effects, b, from the joint probability 
distribution of y and b. Of the two distributions, the marginal distribution of data is 
the only one that can be known and observable. Many non-Gaussian mixed models, 
which seem reasonable, do not distinguish between the distribution of y | b and y. 
Models that do not make this distinction are called marginal models. Estimates 
obtained by marginal models have different expected values compared to those 
produced by conditional models. Therefore, marginal models are not estimated in 
the same way as conditional models. 


3.2 Illustrative Examples of the Data Scale 
and the Model Scale 


In linear models, inference begins with the estimable function K f, and, these 
models, in turn, are defined in terms of the linear function 7 = g(u) = Xf (if there 
are no random effects) and = Xf) + Zb (if there are random effects in the model), 
whereas Kf produces results in terms of the link function. 

For linear normal response models such as LMs and LMMS, the link function is 
not visible because they use the "identity " function as the link. Linear combinations 
of model parameters directly estimate desired values such as differences between 
treatments and many other hypothesis tests of interest. Inference for an LM is 
straightforward. 

For GLMs and GLMMs with a non-normal response, the estimation of Kf yields 
a linear combination of elements of the linear predictor y, which is a linear combi- 
nation of г(и), typically a nonlinear function of и. For example, with Poisson data the 
Kf is a function of logarithm (log) and for binomial data, it is a function of logit or 
probit. However, most of the time, the researcher wants to see the binomial results 
expressed in terms of the probability of the outcome of interest, whereas for Poisson, 
the results are expressed in terms of counts. This means that since both GLMs and 
GLMMs carry out the estimation process on the scale of the model (depending on the 
link used) to report the results of interest in terms of the scale of the data, it is 
necessary to apply the inverse link to the predictor in terms of the model scale to 
express the results. To mention two examples, in the case of the logit link for 
binomial, the results are expressed in terms of probability and, in the case of the 
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Table 3.2 Percentage of germinated seeds (Y) out of total seeds (N) 


Treatment (Trt) Y (no. of germinated seeds) N (total no. of seeds) 
Trtl 54 70 
Тї 41 60 
Trt1 52 70 
Trt2 28 70 
Trt2 22 60 
Trt2 21 70 
Trt3 41 70 
Trt3 37 60 
Trt3 47 70 


Poisson model, they are expressed in terms of counts. To exemplify the model scale 
and the data scale, an example is shown below. 


Example 3.1 Consider the following experiment in which three chemical seed coat 
softeners were tested for studying their effect on germination of tomato seeds in 
Styrofoam trays (Table 3.2). 


To illustrate the above two concepts, we first analyze these data using a 
completely randomized design (CRD), assuming the response variable to be normal, 
and, then, we analyze the same experimental design but with a binomial response 
variable. We are interested in comparing the means of treatments using a completely 
randomized design. Note that for demonstrative purposes, we are assuming that 
Y has a normal distribution, when in fact it has a binomial distribution. 

The components of this model are defined as follows: 


Distribution: y;—N(u;, 0°) 
Linear predictor: n; = ү + z; (i = 1,2, 3) 
Link function: у; = и; (identity link) 


The analysis of variance (ANOVA) (part (a)) and estimated parameters (part (b)) 
of this experimental design indicate that there is a highly significant difference 
between the treatments (P = 0.0033) for the germination of tomato seeds. 
Table 3.3 shows part of the results. 

The estimated parameter values of the model, except for treatment three, are 
shown in the table above (obtained with the “solution” command) because the model 
is over-parameterized. The estimable functions кр for the treatment means аге as 
follows: 
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Table 3.3 Results of the analysis of variance using a CRD 


(a) Type III tests of fixed effects 


Effect Num degree of freedom (DF) Den DF F-value Pr>F 
Trt 2 6 17.25 0.0033 
(b) Parameter estimates 

Effect Trt Estimate Standard error DF t-value Pr > й 
Intercept ñ 41.6667 3.1388 6 13.27 <0.0001 
Trt Trt1 7i 7.3333 4.4389 6 1.65 0.1496 
Trt Trt2 15 — 18.0000 4.4389 6 —4.06 0.0067 
Trt Trt3 13 0. y 

Scale 29.5556 17.0639. 


From the estimated treatment parameters 7; = 4; — fj + 7, we can obtain the 
estimated mean for each one of the treatments (i = 1,2,3) as follows: for treatment 
1, 7; =ñ Հլ = 41.6667 + 7.3333 = 49; for treatment 
2, T2 =ñ + 12 = 41.6667 — 18 = 23.6667; and for treatment 
3, T3 =й + 73 = 41.6667 + 0 = 41. The value of the mean squared error (62), 
which appears іп the table as “Scale,” is 29.5556. 

For the difference between treatments, the т; - z; values for i # Гаге as follows: 
Tı — = + — (ñ + )= – 72 = 7.3333 — Ը 18) = 25.3333,7, — тз = + #1 
= (ñ + 23) = — = 7.333 — 0.0 = 7.3333, and T2 — тз = + 22 (դ | 23) =ù 

73 = — 18.00 — 0.0 = - 18.0. These estimates can be obtained using the Statis- 
tical Analysis Software (SAS) “estimate” and “Ismeans” commands, as shown 
below: 


proc glimmix data=germi; 

class trt; 

model y=trt / solution; 

lsmeans trt / diff e; 

estimate 'lsmtrt 1' intercept 1 trt 10; 

estimate 'lsmtrt 2' intercept 1 trt 01; 

estimate 'lsm trt 3' intercept 1 ЕКЕ 001; 

estimate 'overall mean' intercept 1 trt 0.33333 0.33333 0.33333 
0.33333; 

estimate 'overall mean' intercept 3 trt 1 1 1 1 / divisor=3; 
estimate 'trt diff 1&2' trt 1-10; 

estimate 'trt diff 2&3' trt 01-1; 

run; 


The “estimate” command requires us to specify what we wish to estimate and the 
“intercept” command refers to the intercept (7) and “Trt” to the treatment (т) effects 
under evaluation; the coefficients needed for the estimates are shown above. While 
the “Ismeans” command invokes GLIMMIX in SAS to estimate the treatment 
means, "diff" asks to estimate the differences between pairs of treatments, and “E” 
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Table 3.4 Results obtained using the “estimate” and “Ismeans” commands 

(a) Differences of Trt least squares means 

Trt Trt Estimate Standard error DF t-value Pr > Id 
Trtl Trt2 25.3333 4.4389 6 5.71 0.0013 
Trtl Trt3 7.3333 4.4389 6 1.65 0.1496 
Trt2 Trt3 —18.0000 4.4389 6 —4.06 0.0067 
(b) Estimates 

Label Estimate Standard error DF t-value Pr > Ifl 
LSM Trt 1 49.0000 3.1388 6 15.61 <0.0001 
LSM Trt 2 23.6667 3.1388 6 7.54 0.0003 
LSM Trt 3 41.6667 3.1388 6 13.27 <0.0001 
Overall mean 38.1111 1.8122 6 21.03 <0.0001 
Overall mean 38.1111 1.8122 6 21.03 <0.0001 
Trt diff 1&2 25.3333 4.4389 6 5.71 0.0013 
Trt diff 2&3 —18.0000 4.4389 6 —4.06 0.0067 


displays the coefficients of the estimable functions used in "Ismeans." Some of the 
outputs of the above code are shown in Table 3.4. 

Next, we analyze the same data, also using a CRD, but now assuming a binomial 
distribution in the response variable. N indicates the independent number of 
Bernoulli trials observed in the ijth observation. The components of the model are 
as follows: 


Distribution: y;j- Binomial(N;, z;) 
Linear predictor: g; = у + z; (i = 1,2,3) 


Link function: 7; = logit (=) (logit link) 

Fitting these data in a binomial model, the fixed effects solution of the parameters 
obtained in terms of the model scale are tabulated in Table 3.5. 

The above results were obtained using the following SAS code: 


proc glimmix data=germi; 
class trt; 

mode1 y/n=trt / solution; 
run; 


Similar to the previous example, we can estimate the mean of treatments and the 
differences between two pairs of treatments. The linear predictors for the treatments 
аге as follows: £j, =ñ + 7, =0.5108 + 0.5093 = 1.0201, ñ, =ñ + ĉ = 0.5108 — 

1.108 = — 0.5971, and ñ; =ñ + 23 =0.5108 + 0.0=0.5108, and, for the differ- 
ences between treatments (1 and 2, 1 and 3, and 2 and 3), they are as follows: 

դլ — ĝa = 1.0201 — (— 0.5971) = 1.6173, դլ -- ña = 1.0201 — 0.5108 = 0.5093, and 
No — դ = — 0.5971 — 0.5108 = — 1.1079, respectively 
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Table 3.5 Estimated parameters at the model scale 


Parameter estimates 


Effect Trt Estimate Standard error DF t-value Pr > И 
Intercept դ 0.5108 0.1461 6 3.50 0.0129 
Trt Trtl 72 0.5093 0.2168 6 2.35 0.0571 
Trt Trt2 Т2 —1.1080 0.2078 6 —5.33 0.0018 
тї Trt3 73 0 


Using the relationship between the linear predictor and the link function y; = 


logit( 725 =)= log (1 =), we can estimate the probability of observing a favorable 


outcome for each of the treatments, that is, 71, ло, and zs, respectively. Applying the 
inverse link, we obtain: 


t= /( + ж = /( +e- 0+ );and дз = ն + С) 


Substituting the corresponding values, we obtain 


es 1/(1+e - (0+ )- =1/(1 + e 19901) — 0.735, 
ft) = /( + е-@+ь )) = = 1/(1+e SE) =0.355, and 
йз = v + e7 +) E 1/(1+e 090) — 0.625 


Here, we can see that the treatment with the highest probability of success is 
treatment one, followed by treatment three, whereas treatment two has the lowest 
probability of success. Now, for the difference between two treatments, т; — v; for 
ՍԷ» i, we can estimate the logarithm of the odds ratio as 


Tj— тр = be (5 "Ն ) log s z) = log ( 


where, іп this particular case, odds = (=) is the odds of the treatment і and 


= 


oddsratio = log c И зу ) is the odds ratio for treatments i and m fori Z i. 


When applying the inverse link to the above expression (odds ratio), we get 


Oddsratio — (se (s “)) 
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The value of the odds ratios for treatments 1 and 3 is 
Oddsratio; _ з = Инв 28) = (05093) = 0.6246 


Similarly, for the pair of treatments 1-2 and 2—3, the resulting odds ratios are 
Oddsratio, _ 2 = 0.8344 and Oddsratio, _ з = 0.2483, respectively. It is important to 
mention that the odds ratios are not the mean of the difference of z; — z;for i # i. 

From the previous example, it is clear that when the response variable is not 
normal, parameter estimation and inference occurs at two levels. The linear predictor 
Xf and the estimable function Kf are expressed in terms of the link function, logit — 
estimates on the model scale (scale of the link function) — as in the above example. 
Under the logit link, the logarithm of the odds and the difference of the estimate (log 
odds ratio) are very common and useful terms in categorical data analysis for the 
estimation of treatments or treatment differences in terms of the data scale. 

Commonly, estimation at the model scale in GLMs is not very easy to interpret, 
and, as such, the data scale plays a very important role. A data scale involves 
applying the inverse of the link function to the estimable function, K f, as we did 
in the previous example to convert the log of the odds for each treatment to a 
probability. In general, we use the inverse of the link function to transform the 
estimates at the model scale to the data scale. The inverse of the link function is not 
used for estimating the differences between treatments because the link functions are 
generally nonlinear. This is why the inverse of the link function is not applied to the 
differences between treatments because it produces meaningless results. 

Thus, in the logit model, we have two approximations for the difference. First, we 
could apply the inverse of the link function to each linear predictor for each treatment 
and then take the difference between probabilities: Z; — Z; . That is, we can estimate 
the difference between z; — лу through [1/(1-- e **))] — [1/(1 + e7 *9)] 
and not as [1/ (1 +e с =®))], Second, we know that т; — ту estimates the 
logarithm of the odds ratio by means of e" 9, which produces an estimate of 
the odds ratio. Both approaches are valid, and the use of one approach or the other 
depends on the requirements of the particular study. 

With the GLIMMIX procedure, we can implement the solution in terms of the 
data scale with the “ilink,” “exp,” and “oddsratio” commands, as shown in the 
following SAS code: 


procglimmix; 

class trt; 

model y/n=trt / solution oddsratio; 

lsmeans trt / diff oddsratio ilink ; 

estimate 'lsmtrt 1' intercept 1 trt 1 0/ilink; 

estimate 'lsmtrt 2' intercept 1 trt 01/ilink; 

estimate 'lsmtrt 3' intercept 1 ЕКЕ 001/ilink; 

estimate 'overall mean' intercept 1 trt 0.33333 0.33333 0.33333 
0.33333/ilink; 

estimate 'overall mean' intercept 3 trt 1111 /divisor=3 ilink; 
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estimate 'trt diff 1&2' trt 1 -1 0/ oddsratio ilink; 
estimate 'trt diff 1&3' trt 10 -1/oddsratio ilink; 
estimate 'trt diff 2&3' trt 01 -1/oddsratio ilink; 
estimate "Еге diff 1&2' trt 1 -1 0/exp; 

estimate 'trt diff 1&3' trt 1 0 -1/exp; 

estimate "Еге diff 2&3' trt 01 -1/ехр; 

run; 


Part of the output of “proc GLIMMIX” is shown in Table 3.6. The “Odds ratio 
estimates" (part (а)) are the result of the “oddsratio” command in the previous 
program, whereas the confidence intervals are provided by default. 

What appears under “Estimate” (in part (b)) is in the model scale ў; = ñ + 7;, and 
what appears under “Mean” (in part (b)) is an estimate of the inverse of the link 
function Z; = 1/ (1 tec աա) and, in this case, is а probability that corresponds էօ 
the data scale. Similarly, what appears under “Estimate” is in model scale 7; — Ту, 
whereas the “Odds ratio" values were estimated using e" — ?) and are in data scale. 

Under "Estimates" column in Table 3.7, the log odds ratio appears as an 
"Exponentiated estimate" regardless of whether we use the "oddsratio" or "exp" 
option in the "estimate" command. For the overall mean, the inverse of the link 
function applied to ў + H (ոլ + 2 + 23) is 0.5772, which is totally different from the 
average of 7;;; that is, 5 (a1 + 72 +73) = 1 (0.735 + 0.355 + 0.655) = 0.5816. This 
illustrates that we have to Бе extremely careful when using Ше output of ргос 
GLIMMIX, as it can produce outputs in terms of both the model scale and the 
data scale through the application of the inverse of the link function; however, this 
has to be applied appropriately, otherwise, we will get meaningless results. 


Example 3.2: Randomized complete block design (RCBD) with normal 
and binomial responses 

Now, assume that we have the same example but in an RCBD. The three treatments 
were tested in each of the blocks, as shown in Table 3.8. 


In this example, first, the data are analyzed assuming a normal response and 
assuming that the block effect is fixed; then, they are analyzed assuming a binomial 
response. 

The model components under a Gaussian response variable are as follows: 


Distribution: yj ~ Мих, o2) 
Linear predictor: Из = у + z; + block; (i,j = 1,2,3) 
Link function: у = иу; (identity link) 


From the theory of linear models, we know that we can estimate the ith treatment 
mean through 


3 3 
Ti. =) yi = n + z + и X` block; = + т; + bloq. 
j=l j=l 
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Table 3.6 Results of the “ilink,” “exp,” and “oddsratio” commands 


(a) Odds ratio estimates 


Trt Trt Estimate DF 95% confidence limits 
Trt1 Trt3 1.664 6 0.979 2.829 
Trt2 Trt3 0.330 6 0.199 0.549 


(b) Trt least squares means 


Trt Estimate | Standard error | DF |і-уаше |Рг> й | Mean Standard error Mean 
Тг 1.0201 | 0.1602 6 6.37 |0.0007 |0.7350 |0.03121 
Trt2 | —0.5971 | 0.1478 6 —4.04 |0.0068 | 0.3550 | 0.03384 
Trt3 0.5108 | 0.1461 6 3.50 0.0129 0.6250 | 0.03423 


e Differences of Trt least squares means 


Tm [Esas Summis mmm 


Trt1 Trt2 1.6173 0.2180 7.42 0.0003 5.039 
Trtl Trt3 0.5093 0.2168 2.35 0.0571 1.664 


me cte oam [е [эз 0.0018 


Table 3.7 Estimates at the model scale and at the data scale 


Estimates 
Standard 
Standard error Exponentiated 
Label Estimate | error DF |+ Value |Рг> № | Mean | Mean estimate 
LSM 1.0201 | 0.1602 6 6.37 | 0.0007 | 0.7350 | 0.03121 
Trt 1 
LSM —0.5971 | 0.1478 6 —4.04 | 0.0068 | 0.3550 | 0.03384 
Trt 2 
LSM 0.5108 | 0.1461 6 3.50 | 0.0129 | 0.6250 | 0.03423 
Trt 3 


Overall 0.3113 | 0.08746 |6 3.56 
Mean 
Trt diff 1.6173 | 0.2180 6 7.42 
1&2 


0.0119 | 0.5772 | 0.02134 


0.0003 | 0.8344 | 0.03011 |5.0393 


Trt diff 0.5093 | 0.2168 6 2.35 
1&3 


0.0571 | 0.6246 | 0.05083 |1.6642 


Trt diff | —1.1080 | 0.2078 6 -5.33 
2&3 


0.0018 | 0.2483 | 0.03878 0.3302 


Trt diff 1.6173 | 0.2180 6 7.42 
1&2 


0.0003 5.0393 


Trt diff 0.5093 | 0.2168 6 2.35 
1&3 


0.0571 1.6642 


Trt diff | —1.1080 | 0.2078 6 -5.33 
243 


0.0018 0.3302 
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Table 3.8 Percentage of germinated seeds (Y) out of total seeds (N) in a randomized complete 
block design 


Treatment Block Y (no. of germinated seeds) N (total no. of seeds) 
Тї В1осК1 54 70 
Тї Block2 41 60 
Тг Block3 52 70 
Trt2 В1осК1 28 70 
Trt2 Block2 22 60 
Trt2 Block3 21 70 
Trt3 В1осК1 41 70 
Trt3 Block2 37 60 
Trt3 Block3 47 70 


os 3 
where blog, = 4 > block;. 
j=l 
For the mean difference of two treatments i and Г, this is estimated as 


ie Пи. =n + zi + bloq. — (n + zç + blog.) = 7; ті, 


The goal of this experiment could be to compare the treatment means, that is, 
Hi. — n, = N3 , equivalently — this сап be expressed as v; = T2 = тз — or to compare 
one treatment with the average of the other treatments: for example, to compare 
treatment 1 with the averages of treatments 2 and 3 (Trt1.vs.average.Trt2.and.Trt3). 

For the hypothesis test of the equality of treatments (тү = 75 = тз), the estimable 
function Kf is given by: 


n 
11 
‚ [010-1 000] E 


K == 2 p= Т3 
00 1 -1 000 bloq, 
bloq; 


bloq; 


While for contrasts Trtl.vs.average.Trt2.and.Trt3 апа Trt2.vs.average.Trt1.and. 
Trt3, K f is given by 


Trtl.vs.average.Trt2.and.Trt3 = ў. — (йо. +3.) =T1 — (2 >) 


Trt2.vs.average.Trtl.and.Trt3 =. — '/2(й1. +73.) = — (>=) 
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7] 
Т] 
T2 


> 
| 
OS 
о о 
о о 
ll 


T3 
bloq; 
bloq; 
bloq; 


The following GLIMMIX procedure allows us to implement the above example. 


proc glimmix; 

class trt block; 

model y= trt block/solution; 

lsmeans trt / diff e; 

estimate 'lsmtrtl1' intercept 3 trt 3 0 0 0 block 1 1 1 / divisor=3; 
estimate 'overall mean' intercept 3 trt 1 1 1 1 block 1 1 1 1 / divisor=3; 
estimate 'average trtl&trt2' intercept 6 trt 3 3 0 block222/ 
divisor=6; 

estimate 'average trtl&trt2&trt3' intercept 9 trt 3 3 3 3 block3 333 
3/divider=9; 

estimate 'trtl ՄՏ trt2' trt 1-10; 

estimate 'trtl vs ЕКЕЗ! trt 10 -1; 

estimate 'trt2 vs trt3' trt 01-1; 

estimate 'trtl vs trt2' trt 1-10, 'trtl ув trt3' trt 10-1, 'trt2 vs 
trt3' trt 0 1 -1/divisor=1,1,1 adjust=sidak; 

contrast 'trtl vs trt2'trt1-10; 

contrast 'trtl vs trt3' trt10 -1; 

contrast 'trt2 vstrt3' trt 01-1; 

contrast 'trtl vs average trtl,trt2'! trt2 -1-1; 

contrast 'trt2 vs average trtl,trt3! trt -1 2-1; 

contrast 'type 3 trt вв! trt 10 -1 0,trt 01-1; 

contrast 'Еуре 3 trttest'trt2-1-1,trt-12-1; 

run; 


Part of the GLIMMIX output is shown below in Table 3.9. Parameter estimation 
for treatments 1—2 and blocks 1—2 are shown below, except for treatment and block 
3. This is because it is an incomplete rank model. The generalized inverse is used in 
the estimation through the SWEEP operator of SAS. In this case, it sets the last class 
effect equal to zero (Table 3.9). 

“Coefficients” (part (a) of Table 3.10) obtained with option E for the least squares 
means of treatments in “Ismeans” shows how SAS uses this information in the 
parameter solution to calculate the treatment means (part (b)). In part (c), we can see 
the difference of means obtained with the "diff" option in "Ismeans." 

The estimates obtained from the "estimate" command with multiple estimable 
functions and in the “Sidak” adjustment and contrasts are shown in Table 3.11. This 
adjustment allows us to control for type I errors. The "adjust" option in "estimate" in 
the Sidak adjustment (part (b)) allows us to obtain the adjusted P-values denoted as 
AdjP in addition to Pr > lll. 
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Table 3.9 Estimation of treatment and block parameters 


Parameter estimates 


Effect Trt BLOCK | Estimate Standard error | DF | ¢t-value | Pr > Ifl 
Intercept (ñ) 43.5556 | 3.1866 4 13.67 | 0.0002 
Trt (71) 1 7.3333 3.4907 4 2.10 0.1036 
Trt (25) 2 —18.0000 | 3.4907 4 —5.16 0.0067 
Trt (73) 3 0 
BLOCK (block, ) 1 1.0000 | 3.4907 4 0.29 | 0.7887 
BLOCK (block;) 2 —6.6667 |3.4907 4 —1.91 0.1288 
BLOCK (blóck;) 3 0 
Scale (6°) 18.2778 | 12.9243 
Table 3.10 Coefficients for (a) Coefficients for Trt least squares means 
i ааа лы Trt [BLOCK  |Rowl Row2 Row3 
Intercept 1 1 1 
Trt 1 1 
Trt 2 1 
Trt 3 1 
BLOCK 1 0.3333 0.3333 0.3333 
BLOCK 2 0.3333 0.3333 0.3333 
BLOCK 3 0.3333 0.3333 0.3333 
(b) Trt least squares means 
Trt Estimate Standard error DF t-value Pr > И 
1 49.0000 2.4683 19.85 <0.0001 
2 23.6667 2.4683 4 9.59 0.0007 
3 41.6667 2.4683 4 16.88 <0.0001 
(с) Differences of Trt least squares means 
Trt | Ти | Estimate Standard error | DF t-value Pr > И 
1 2 25.3333 3.4907 4 7.26 0.0019 
1 3 7.3333 3.4907 4 2.10 0.1036 
3 3 —18.0000 3.4907 4 -5.16 0.0067 


Тһе planned contrasts іп matrix K and with the F-values obtained with the 
“contrast command” produce the same results (part (c)). 

Now, the same dataset is fitted using the same predictor but assuming that the 
response variable is binomial. This analysis intends to show the options available in 
the SAS commands when you want to fit non-normal responses; in this case, it is 
binomial. Practically, the same commands used in the previous program with normal 
data are used, but, now, some other options (“‘ilink,” “oddsratio,” ог “ехр”) are 
exemplified with details under what circumstances they should be used. This is 
because all estimable functions produce estimates at the model scale, and we must 
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Table 3.11 Multiple estimates and contrasts 


(a) Estimates 


Label Estimate Standard error DF t-value Pr > Ш 
LSM Тг 49.0000 2.4683 4 19.85 <0.0001 
Overall mean 38.1111 1.4251 4 26.74 <0.0001 
Average Trtl &Trt2 36.3333 1.7454 4 20.82 «0.0001 
Average Trt1&Trt2&Trt3 38.1111 1.4251 4 26.74 «0.0001 
Trtl vs Trt2 25.3333 3.4907 4 7.26 0.0019 
Trtl vs Trt3 7.3333 3.4907 4 2.10 0.1036 
Trt2 vs Trt3 — 18.0000 3.4907 4 —5.16 0.0067 
(b) Estimate adjustment for multiplicity: Sidak 

Label Estimate Standard error DF t-value Pr > й Adj P 
Trtl vs Trt2 25.3333 3.4907 4 7.26 0.0019 0.0057 
Те vs Тиз 7.3333 3.4907 4 2.10 0.1036 0.2796 
Trtl уз Trt3 —18.0000 3.4907 4 —5.16 0.0067 0.0200 
(c) Contrasts 

Label Num DF Den DF F-value Рг> Е 
Ти vs Trt2 1 4 52.67 0.0019 
Trtl vs Тиз 1 4 4.41 0.1036 
Trt2 vs Trt3 1 4 26.59 0.0067 
Trtl vs average Trtl,Trt2 1 4 29.19 0.0057 
Trt2 vs average Trt1,Trt3 1 4 51.37 0.0020 
Type 3 Trt SS 2 4 27.89 0.0045 
Type 3 Trt Test 2 4 27.89 0.0045 


decide what conversions are necessary to obtain the results at the data scale. Below, 
the estimable functions and the appropriate conversion required to produce the 
results on the data scale are listed. 


(a) Least squares means (“Ismeans’’) for normal data and an inverse link (“ilink”) for 
non-normal data 

(b) Difference between pairs of treatment means of "Ismeans" for normal data апа 
“odds ratio" for non-normal data 

(c) Estimation of the mean of a treatment (“estimate”) for normal data and an inverse 
link (“link”) for non-normal data 

(d) Estimation of a treatment i vs treatment i: exponentiation (“exp”) (or odds ratio) 

(e) Multiple estimates of treatment differences as “exp” (or odds ratio) for 
non-normal data 

(f) In “contrast estimation,” conversion to the data scale is not necessary, since it is 
only an F-statistic test. 


The following GLIMMIX program shows how to implement this model with a 
binomial response. 
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proc glimmix; 

class trt block; 

mode1 у/п = trt block/solution oddsratio; 

lsmeans trt / diff e oddsratio; 

estimate 'lsmtrtl' intercept 3 ԵՒԵ 3 0 0 0 block 1 1 1 1/divider=3 ilink; 
estimate 'difference trt1 vs trt2' trt 1 -1 0/exp; 

estimate "ауд trtl&trt2&trt3' intercept 9 trt 3 3 3 3 block3 333 
3/divider=9; 

estimate 'trtl vs trt2' trt 1 -1 0/exp; 

estimate 'trtl vs trt3' trt 1 0 -1/exp; 

estimate 'trt2 ув ЕКЕЗ! trt 0 1 -1/exp; 

estimate 'trtl vs trt2' trt 1-10, 'trtl ув trt3' trt 10-1, 'trtl vs 
trt3' trt 0 1 -1/exp adjust=sidak; 

contrast 'trtlvstrt2' trt1-10; 

contrast 'trtl vs trt3' ЕГЕ 10-1; 

contrast 'trt2 vs trt3! trt 01-1; 

contrast ԵՐԵՍ vs average trtl,trt2!' trt 2 -1 -1; 

contrast 'trt2 vs average trtl,trt3' trt -1 2-1; 

contrast 'type3 trë вв! ЕКЕ 10-10,ЕКЕ 01-1; 

contrast '"type3 trt test' ԵՐԵ 2-1 -1,treé -12-1; 

кип; 


Part of the output is shown in Table 3.12. The estimated treatment and block 
parameters of the model are given in part (a) of Table 3.12; the last two effects of 
both classes were restricted to zero because they are incomplete rank design matri- 
ces. In part (b), the type II tests of fixed effects and in part (c) the odds ratio 
estimates are provided. Note that ó2 does not appear in the output because the 
variance of the binomial distribution is not an independent parameter. 

In Table 3.12 (parts (b) and (c)), which shows the sum of the squares of fixed 
effects type Ш as well as Ше odds ratio, it can be seen that only the effect of 
treatments is significant but not the effect of blocks, which indicates that it is valid 
to analyze these data using a completely randomized design. Two sets of odds ratios 
were estimated (part (c)): one for the treatment effects and the other for the block 
effects in the model. In the calculation of odds ratios, generally, the last level of the 
factor is compared with the rest of the levels of that same factor. 

The estimates obtained with “estimate”, in Table 3.13 (parts (a) and (b)), are 
results in terms of the model scale, whereas the last column is obtained by applying 
EXP (e%~ 7"), 

The least squares means for treatment and the linear predictors of treatment 
differences (parts (a) and (b) of Table 3.14, respectively) obtained with “Ismeans” 
are the values under the “Estimate” column, and, these, together with their 
corresponding standard errors, were obtained using the linear predictor 
Пу = 5 + Т i + block. 2 

These estimates аге оп the model scale, whereas Ше values under Ше “Mean” 
column and their respective standard errors were obtained by applying the inverse 
link to obtain the probabilities of success of each treatment (7;). While the estimated 
linear predictors for the mean differences were obtained with the “oddsratio” option, 
the mean difference in the data scale is obtained by taking the inverse of these 
predictors. 
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Table 3.12 Results of the analysis of уапапсе in a binomial model 


(a) Parameter estimates 


Effect Trt BLOCK Estimate Standard error | DF |1-уаше | Pr > Id 
Intercept (ñ) 0.5099 0.1883 4 2.71 | 0.0536 
Trt (71) 1 0.5097 0.2169 4 2.35 | 0.0785 
Trt (25) 2 —1.1088 0.2079 4 —5.33 | 0.0059 
Trt (73) 3 0 
BLOCK (block; ) 1 0.06541 | 0.2088 4 0.31 0.7698 
BLOCK (block;) 2 —0.07205 | 02164 4 -033 | 0.7559 
BLOCK (blócks) 3 0 
(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
Trt 2 4 29.55 0.0040 
BLOCK 2 4 0.20 0.8258 
(c) Odds ratio estimates 
Trt BLOCK _Trt _BLOCK estimate DF 95% confidence limits 
1 3 1.665 4 0.912 3.040 
2 3 0.330 4 0.185 0.588 

1 3 1.068 4 0.598 1.906 

2. 3 0.930 4 0.510 1.697 


3.3 Fixed and Random Effects іп the Inference Space 


In an analysis, inference can be directed solely at fixed effects (population inference) 
or at a combination of fixed and random effects (specific inference). To illustrate 
these two levels of inferences, we will consider two examples: 


3.3.1 А Broad Inference Space or a Population Inference 


In practice, the random effects in a linear mixed model (LMM) should represent the 
population from which the data were collected and should be included in studies as if 
they came from a well-planned sample. In a model, random effects can be locations, 
regions, states, blocks, and so on, and they have two very particular characteristics. 


“ Random effects represent the target population. 
“ Random effects have a probability distribution. 


These two characteristics allow us to have a broad inference space where we can 
calculate point estimates, estimate intervals, and perform hypothesis testing appli- 
cable to the entire population represented by the random effects. Formally, an 
estimate or hypothesis test based on an LMM indicates that we have a broad 
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Table 3.14 Estimated linear predictors for treatments and treatment differences with their respec- 
tive inverse values 


(a) Trt least squares means 


Estimate | Standard error Pr > й Standard error Mean 
1.0174 10.1603 0.0032 | 0.7345 | 0.03127 
—0.6011 | 0.1480 4 —4.06 0.0153 | 0.3541 | 0.03385 
0.5077 | 0.1462 4 3.47 10.0255 | 0.6243 | 0.03429 
(b) Differences of Trt least squares means 
Trt _Trt Estimate Standard error DF t-value Pr > А Odds ratio 
1 2 1.6184 0.2181 4 7.42 0.0018 5.045 
1 3 0.5097 0.2169 4 2.35 0.0785 1.665 
2 3 —1.1088 0.2079 4 —5.33 0.0059 0.330 


inference space defined by the estimable function кр if Z is a matrix with coeffi- 
cients equal to zero; otherwise, the estimation or hypothesis test is defined by the 
prediction function K J + Z p, which is a specific inference. 


3.3.2 Mixed Models with a Normal Response 


In Example 3.2, the response variable was assumed as a function of fixed effects due 
to treatments and blocks, since block effects were also assumed to be fixed effects. 
Now, suppose that applications of treatments were done by three different people 
(blocks); then, assuming that the block effects are fixed, this would be questionable 
since each person does their job according to their experience, skill, and so forth. 
Clearly, there is some variability between blocks that is not due to the experiment 
and this has to be removed, so the effects due to blocks must be considered random. 
In this example, let us assume that the three blocks (persons) were randomly selected 
from a population. Thus, the components of the model are defined as follows: 


Distribution: 


(a) y;| block; ~ Мих, o?) 
(b) bloque; ~ N(0, og.) 


Linear predictor: g; = 4 + z; + block; (i,j = 1,2,3) 
Link function: 17; = иу (identity link) 


Note the impact of changing the estimable function for the mean of treatments. In 
Example 3.2, the estimable function was defined by E(y;.) =n + т; + block.. Now 
with the mixed model (fixed effects and random blocks), the estimable function is 
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defined by E(y;.) =n + т; because E(block) = 0. Therefore, the estimable function 
for the mean in each of the treatments is 7 + z;. In this situation, two questions arise: 


* How much do the results obtained from a fixed effects model differ from those 
obtained from a mixed model? 
* How can we compare the two results? 


The following program allows us to estimate a mixed model with a normal 
response. 


procglimmix; 

class trt block; 

model y — trt /solution; 

random block/solution; 

lsmeans trt / diff e; 

estimate 'lsmtrtl' intercept 1trt1000|block0000; 

estimate 'lsmtrt2' intercept 1 trt 0 1 0|block 0 0 0 0; 

estimate 'lsmtrt3' intercept 1 trt 0 0 0 1|block 0 0 0 0; 

estimate 'blup trt1' intercept 3 trt 3 0 0 0|block 1 1 1 / divisor=3; 
estimate 'blup trt2' intercept 3 trt 0 3 0|block 1 1 1 / divisor=3; 
estimate 'blup trt3' intercept 3 trt 0 0 3|block 1 1 1 / divisor=3; 
run; 


In the previous SAS GLIMMIX code, the “estimate” command shows the 
coefficients associated with the fixed effects before the vertical bar (|) and after the 
vertical bar, are provided the coefficients for the random effects associated with the 
model, that is: 


efectosfijos efectosaleatorios 

-------Հ- ттт--- 
1100/^ 1 1 1 /block, 
K'p+Zb=|1 0 1 Of |+ 1 [| block 
0 0 1 Ks 1 1 1 \Ыоскз 


Part of the output is shown in Table 3.15. Subsection (a) shows the estimated 
variance components due to blocks, and for conditional observations, the effect of 
the blocks is Е = 11.2778, whereas the mean squared error (MSE) is 
6? = 18.2778. On the other hand, the fixed effects solution obtained with the 
“solution” option of the parameters is provided in part by (b). The analysis of 
variance (part (c)) indicates that there is a significant difference between treatments 
(Р = 0.0045), and so the null hypothesis must be rejected (Ho : pı = ил = из). 

The means estimated with the “estimate” statement, given that the mean block 
effect is zero, the mean and the best linear unbiased predictor for each treatment are 
similar, as shown in Table 3.16 (part (a)). Subsections (b) and (c) show the means 
and differences between two means estimated with the “Ismeans” statement and the 
"diff" option. 
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Table 3.15 Variance components, fixed effects, and fixed effects test 


(a) Covariance parameter estimates 


Cov Parm Estimate Standard error 
BLOCK 11.2778 17.8966 
Residual 18.2778 12.9243 

(b) Solutions for fixed effects 

Effect Trt Estimate Standard error DF t-value Pr > ld 
Intercept 41.6667 3.1388 2 13.27 0.0056 
Trt 1 7.3333 3.4907 4 2.10 0.1036 
Trt 2 —18.0000 3.4907 4 —5.16 0.0067 
Trt 3 0. 

(с) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Trt 2 4 27.89 0.0045 


3.4 Marginal and Conditional Models 


The process of analyzing a dataset has two main objectives: the first is model 
selection, which aims to find well-fitting parsimonious models for the responses 
being measured, and the second is model prediction, where estimates from the 
selected models are used to predict quantities of interest and their uncertainties. 

The differences that may arise in this analysis process are mainly due to the 
choice of unidentifiable constraints on random effects. To compare two different 
models, we must compare analogous quantities. Different constraints can lead to 
apparently extremely different but inferentially identical models. The conditional 
model is believed to be the basic model, and any conditional model leads to a 
specific marginal model. Lee and Nelder (2004) proposed and worked on condi- 
tional models derived from generalized hierarchical linear models (GHLMs) and 
marginal models derived from these conditional models. Marginal models have 
often been fitted using generalized estimating equations (GEEs), the drawbacks of 
which are also discussed. 


3.4.1 Marginal Versus Conditional Models 


Consider two models with a normal distribution: one is a random effects model 
(a mixed model) 


Yy = H + ti + bj; + еу 


where b; — N (0, оў) and ej-N(0, o?) The other is а marginal model 
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Table 3.16 Estimated means, best linear unbiased estimates (BLUEs), and BLUPs for treatment 
and the difference between two means 


(a) Estimates 


Label Estimate Standard error DF t-value Pr > й 
lsm trt 49.0000 3.1388 4 15.61 <0.0001 
Ism trt2 23.6667 3.1388 4 7.54 0.0017 
Ism trt3 41.6667 3.1388 4 13.27 0.0002 
blup її 49.0000 2.4683 4 19.85 <0.0001 
blup trt2 23.6667 2.4683 4 9.59 0.0007 
blup trt3 41.6667 2.4683 4 16.88 <0.0001 
(b) Trt least squares means 
Trt Estimate Standard error DF t-value Pr > Id 
1 49.0000 3.1388 4 15.61 <0.0001 
1 23.6667 3.1388 4 7.54 0.0017 
2 41.6667 3.1388 4 13.27 0.0002 
(c) Differences of Trt least squares means 
Trt _Trt Estimate Standard error DF t-value Pr > l! 
1 2 25.3333 3.4907 4 7.26 0.0019 
1 3 7.3333 3.4907 4 2.10 0.1036 
2 3 —18.0000 3.4907 4 —5.16 0.0067 
E(y;) =“ + vi 


where the elements in V( y) = 2 are variances and covariances that have an arbitrary 
correlation structure. Zeger et al. (1988) pointed out that given a marginal model, the 
generalized estimating equations are consistent. An obvious advantage of using 
random effects models is that they allow conditional inferences in addition to 
marginal inferences (Robinson 1991). Using the model with random effects, we 
can obtain not only the conditional mean 


и Ебу) enn 


but also the marginal mean 
Hi =E(u§) = E(yjlbj) ^ n + zí 


whereas with the marginal model, we can obtain only the marginal mean и. 

It may be reasonable to assume that the unobservable characteristic of the random 
effects of blocks (bj) follows a certain distribution. However, the center of this 
distribution cannot be identified because it is confounded with the intercept. There- 
fore, in the random effects model, we put the unidentifiable constraints E(b;) — 0 and 
E(ej) = 0 as we do for error terms in linear models. In the mixed model, these 
restrictions are >b, =0 апа >Ë; =0 in any estimation procedure. First, we 
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Table 3.17 Mortality of coffee seedling clones (C) in different substrates (S) 


Block S C Mortality Pct Block S C Mortality Pct 

1 3 1 3 3 1 6.6 0.066 
1 3 3 3 3 2 10 0.1 

1 3 2 3 3 3 56.6 0.566 
1 1 1 3 2 1 3.3 0.033 
1 1 3 3 2. 2 26.6 0.266 
1 1 2 3 2 3 40 0.4 

1 2 1 3 4 1 3.3 0.033 
1 2 3 3 4 2. 46 0.46 
1 2 2 3 4 3 33.3 0.333 
1 4 1 3 1 1 6.6 0.066 
1 4 3 3 1 2 43.3 0.433 
1 4 2 3 1 3 50 0.5 

2 4 3 4 4 1 33 0.33 
2 4 1 4 4 2 10 0.1 

2 4 2 4 4 3 23.3 0.233 
2 1 3 4 2 3 50 0.5 

2 1 1 4 1 2 23.3 0.233 
2 1 2 4 1 3 6.6 0.066 
2 2 3 4 2 1 16 0.16 
2 2 1 4 2 2 10 0.1 

2 2 2 4 2 3 16 0.16 
2 3 3 3.3 0.033 


consider the case in which the data follow a normal distribution. We then briefly 
discuss how the results differ for data with a non-normal distribution. 


3.4.2 Normal Distribution 


Example The effect of different substrates (factor A), i.e., three substrates made 
from vermicompost and one from compost, on the development of physiological 
variables and mortality of cuttings of three clones (factor B) of robusta coffee 
(Coffea canephora p.) was evaluated. The levels of factor A are randomly assigned 
to rows in each block, with the following restriction: each block receives levels A1, 
A2, A3, and A4 and each level of factor B (B1, B2, and B3) is randomly assigned to 
each level of factor A in each block. The data for this experiment are tabulated in 
Table 3.17. 

Note that while there are two randomization processes, there are effectively three 
sizes of experimental units: rows for A levels, columns for B levels, and row—column 
intersections for A x B combinations. Thus, the experimental design used was a 
complete randomized design with a strip-plot treatment arrangement. 
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The model, for these data, is given below: 
Уж =H + br + ai + (ab), + B; + (Bb); + (aB); + еш 


where у; is the kth response observed at the ith level of factor A and at the jth level 
of factor В, и is the overall mean, Ե, is the random effect due to blocks assuming 
by c N (0, օջ). а; is the fixed effect due to substrate type (S), (ар): is the random 
effect due to the interaction of a substrate with blocks assuming (ар), ~ N (0, Onn) ; 
В;15 the fixed effect due to the coffee clone type (С), (2р), is the random effect due to 


the interaction of a coffee clone with blocks assuming (fb) pcr (o. in) ‚ (a); is 


the interaction fixed effect between a substrate and a coffee clone, and ғ; is the 
normal random error ejj,-N(0, o°). The components of the model for this dataset are 
as follows: 


Linear predictor: к = и + by + а; + (ab)ix + B; + (ВБ) + (а); 
Distributions: yj; | by, (Gb); (ВБ) Nui, 6?) 


by ~ №(0, 02); (ab); ~ М(0,62,); (Bb) ~ мо, 75 


Link function: nj, = pij 
The following GLIMMIX syntax sets a GLMM with a normal response. 


procglimmix; 

class blocks c; 

model y=s |с; 

random intercept в w/subject=block; 
lsmeans s*c/ slicediff=s; 

run; 


Part of the results of this analysis is shown below. The estimated variance 
components for blocks, block x substrate, blocks x clon, and the MSE are 
6; = 23.4714, 62, = 35.4995, бу, = 67.0160 and ô? = СМЕ = 139.58, respectively, 
which are listed in part (a) of Table 3.18. However, the fixed effects tests for both 
factors and the interaction (part (b)) are not statistically significant. 

According to the “slicediff = s" option in the "Ismeans" statement, Table 3.19 
shows the simple effects of each substrate level at varying clone levels. 


3.4.3 Non-normal Distribution 


Example Using the data in Table 3.17 but under a beta distribution, the components 
of the GLMM change slightly: 
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Table 3.18 Estimated vari- 
ance components and type III 
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(a) Covariance parameter estimates 


tests of fixed effects Cov Parm Subject Estimate Standard error 

Intercept Block 23.4714 58.9336 

S Block 35.4995 44.9134 

C Block 67.0160 64.0909 

Residual 139.58 49.9056 

(b) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr> F 

5 3 8.318 0.48 0.7076 

С 2 5.935 1.68 0.2650 

S*C 6 16.65 0.44 0.8441 
Table 3.19 Simple effect comparisons across substrate levels 
Simple effect comparisons of S*C least squares means by S 
Simple effect level |С | C | Estimate Standard error |DF | value | Pr> Id 
51 1 2 —12.9814 10.9667 15 —1.18 0.2549 
$1 1 3 —12.1564 10.9667 15 —1.11 0.2851 
51 2 3 0.8250 10.1636 15 0.08 0.9364 
52 1 2 —6.8325 10.1636 15 —0.67 0.5116 
$2 1 3 —15.2779 9.8708 15 —1.55 0.1425 
$2 2 3 —8.4454 9.8708 15 —0.86 0.4057 
53 1 2 —4.4620 12.8219 15 —0.35 0.7327 
53 1 3 —19.0350 12.8124 15 —1.49 0.1581 
53 2 3 —14.5730 11.4417 15 md WE) 0.2222 
S4 1 2 —11.5925 10.1636 15 —1.14 0.2719 
54 1 3 —8.2425 10.1636 15 —0.81 0.4301 
54 2 3 3.3500 10.1636 15 0.33 0.7463 


Distributions: y; | by, (ар), (#Б)г-Веїа(ик„ p), where ф is the scale parameter. 


by М(0,02); (аБ) — М(0,62,); (Bb) ~ n(o, hn) 


Linear predictor: nije = и + b, + а; + (ар) + B; + (Bb) + (аф); 
Link function: nj. = logit(uix) 


The following GLIMMIX syntax sets a beta response variable. 


proc glimmix method=laplace; 


class block sc; 


model pct=s|c/dist=beta; 


random intercept s w/subject=block; 
lsmeans s*c/plot=meanplot (sliceby=s join) slicediff=s ilink; 


run; 
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Table 3.20 Variance compo- (4) Covariance parameter estimates 


наны. Соу Parm Subject Estimate Standard error 
‘Intercept [block (0.06723 046702 

5 block 0.1594 0.1420 

C block 0.1932 0.1687 

Scale (ф) 16.6041 5.6153 


(b) Type III tests of fixed effects 


Effect Num DF Den DF F-value Pr>F 
S 3 8 0.74 0.5584 
C 2 6 2.60 0.1540 
S*C 6 15 0.59 0.7303 


Table 3.21 Simple effect comparisons across substrate levels 


Simple effect comparisons of S*C least squares means by 5 


Simple effect level C ше Estimate Standard error Ի |І-уаше | Pr> Id 
51 1 2 —0.9698 0.6578 15 —1.47 0.1611 
51 1 3 —0.9298 0.6696 15 —1.39 0.1852 
51 2 3 0.03999 |0.5457 15 0.07 0.9426 
S2 1 2 —0.4407 0.5432 15 —0.81 0.4299 
52 1 3 —0.8555 0.5237 15 —1.63 0.1231 
S2 2 3 —0.4149 0.4911 15 —0.84 0.4115 
53 1 2 —0.4588 0.8285 15 —0.55 0.5879 
S3 1 3 —1.3224 0.7773 15 —1.70 0.1095 
53 2 3 —0.8636 0.6375 15 —1.35 0.1955 
$4 1 2 —0.9880 0.5619 15 —1.76 0.0991 
S4 1 3 —0.7138 0.5739 15 —1.24 0.2326 
54 2 3 0.2741 0.5261 15 0.52 0.6099 


Some of the SAS output from this analysis is shown below. The variance 
components estimated for blocks, block x substrate, blocks x clon, and the scale 
parameter are ё? = 0.06723, d — 0.1594, ГА = 0.1932, апа with scale parameter 


ф = 16.6041, respectively, which аге listed in part (а) of Table 3.20. However, the 
fixed effects tests for both factors and the interaction (part (b)) are not statistically 
significant. Unlike a normal distribution (the previous example), the variance com- 
ponents (multiplied by 100) under the beta distribution are smaller, and the type III 
fixed effects test is closer to be significant. 

Table 3.21 shows, for each substrate level at varying clone levels, the estimates 
(linear predictors) of the simple effects. These effects differ from the previous 
results, but this is mainly because in a GLMM, these values correspond to the linear 
predictors estimated at the model scale and not to the estimated means at the data 
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Table 3.22 Total yields (grams) of barley varieties іп 12 independent trials 


Variety 

Location Manchuria Svansota Velvet Trebi Peatland 
1 81.0 105.4 119.7 109.7 98.3 
1 80.7 85.3 80.4 87.2 84.2 
2 146.6 142.0 150.7 191.5 145.7 
2 100.4 115.5 112.2 147.4 108.1 
3 82.3 77.3 78.4 131.3 896 

3 103.1 105.1 116.5 139.9 129.6 
4 119.8 121.4 124.0 140.8 124.8 
4 98.9 61.9 96.2 125.5 75.7 
5 98.9 89.0 691 89.3 104.1 
5 66.4 49.9 96.7 61.9 80.3 
6 86.9 77.1 78.9 101.8 96.0 
6 67.7 66.7 67.4 91.8 94.1 


scale (Example 3.4.2). It is also important to note that the degrees-of-freedom 
correction in the estimation of means cannot yet be used in the estimation of a 
GLMM. 


3.5 Exercises 


Exercise 3.5.1 The data in the Table 3.22 below show the yield of five barley 
varieties in a randomized complete block experiment conducted in Minnesota 
(Immer et al. 1934). 


* Write a complete description of the statistical model associated with this study 
and the assumptions of this model. 

* Compute the ANOVA for the design model according to part (a) and determine 
whether there is a significant difference in the varieties. 

* Use the least significance difference (LSD) method to make pairwise compari- 
sons of variety mean yields. 


Exercise 3.5.2. Lew (2007) conducted an experiment to determine whether cultured 
cells respond to two drugs (chemical formulations). The experiment was conducted 
using a cell culture line placed in Petri dishes. Each experimental trial consisted of 
three Petri dishes: one treated with drug 1, one treated with drug 2, and one untreated 
as a control. The data are shown in the following Table 3.23: 


(a) Write a complete description of the statistical model associated with this study 
and the assumptions of this model. 

(b) Analyze the data using a completely randomized design. Is there a significant 
difference between the treatment groups? 
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ipee — шш рыт бат 
Ensayo 1 1147 1169 1009 
Ensayo 2 1273 1323 1260 
Епзауо 3 1216 1276 1143 
Ensayo 4 1046 1240 1099 
Ensayo 5 1108 1432 1385 
Ensayo 6 1265 1562 1164 


(c) Analyze the data as a randomized complete block design, where the number of 
trials represents a blocking factor. 

(d) Is there any difference in the results obtained in (a) and (b)? If so, explain what 
might be the cause of the difference in results and what method would you 
recommend? 
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Chapter 4 (Я) 
Generalized Linear Mixed Models шее 
for Non-normal Responses 


41 Introduction 


Generalized linear mixed models (GLMMs) have been recognized as one of the 
major methodological developments in recent years, which is evidenced by the 
increased use of such sophisticated statistical tools with broader applicability and 
flexibility. This family of models can be applied to a wide range of different data 
types (continuous, categorical (nominal or ordinal), percentages, and counts), and 
each is appropriate for a specific type of data. This modern methodology allows data 
to be described through a distribution of the exponential family that best fits the 
response variable. These complex models were not computationally possible up until 
recently when advances in statistical software have allowed users to apply GLMMs 
(Zuur et al. 2009; Stroup 2012; Zuur et al. 2013). Researchers in fields other than 
statistical science are also interested in modeling the structure of data. For example, 
in the social sciences there have been applications in the field of education when 
several tests are applied to students; in longitudinal personality studies when the 
occurrence of an emotion is repeatedly observed over time over a set of people; and 
in surveys to investigate the political preference of a population, among others. 
Likewise, agriculture and life sciences are other major areas, where the measure- 
ment of response variables depart from the conventionally used classical methodol- 
ogy based on “normality” to model or describe the data set, i.e., data that generally 
fall within the nominal, ordinal or interval (continuous) scales of measurement. In a 
GLMM, the data response does not undergo any transformation, but, instead, the 
response is modeled as a function of the expected value through a linear relationship 
with the explanatory variables. GLMMs, a powerful tool, allow proper modeling of 
variations between groups and between space and time, leading to accuracy in the 
modeling of the observed data as well as in the estimation of variance components. 
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42 A Brief Description of Linear Mixed Models (LMMs) 


Before addressing GLMMs, we present a brief overview of linear mixed models 

(LMMs). An LMM is a model whose response variable is normal and assumes: 

(1) that the relationship between the mean of the dependent variable (у) and fixed 

and random effects can be modeled as a linear function; (2) that the variance is not a 

function of the mean; and (3) that random effects follow a normal distribution. 
The classic representation of an LMM in the matrix form, is 


У=ХВ+ 726 +2 (4.1) 


where у is Ше vector (n x 1) of the response variable; X is Ше design matrix 
(n x (p + 1)) of fixed effects with rank К; В is the vector of unknown parameters 
((p + 1) x 1); Z is the design matrix (n x 4) of random effects; апа b is the vector of 
unknown parameters of random effects (q x 1), assuming that the vector of random 
effects b follows a normal distribution with mean 0 and variance matrix G, that 15, 
b~N(0, G). Finally, 8 is the error vector with a normal distribution with mean 0 and a 
variance-covariance matrix (e—N(0, R)); both vectors b and ғ are assumed to be 
independent of each other. 

Model 4.1, as previously mentioned, can be described in terms of a probability 
distribution in two ways: the first is the marginal model y-N(E[y] = Xf, Var 
[y] = V = ZGZ + R), where the mean is based solely on the fixed effects, and the 
parameters describing the random effects are contained in the variance and covari- 
ance matrix V (Littell et al. 2006), while the second form is the conditional model 
y | b=N(Xf + Zb, R). Under normality assumptions, both models are exactly the 
same and hence produce the same solution, whereas when normality is not satisfied, 
the models produce different solutions (Stroup 2012). 


4.3 Generalized Linear Mixed Models 


Most datasets in agricultural, biological, and social sciences often fall outside the 
scope of the traditional methods taught in introductory statistics and statistical 
methods. Often, these data (response variables) are: (a) binary (the presence or 
absence of a trait of interest, success or failure, the infection status of an individual, 
or the expression of a genetic disorder); (b) proportional (the ratio of females to 
males, infection or mortality rates within a group of individuals); or (c) counts (the 
number of emerging seedlings, the number of sprouts, etc.), where basic statistical 
methods attempt to quantify the effects of each predictor variable. However, often, 
studies of these experiments involve random effects, the purpose of which is to 
quantify variation among individuals or units. The most common random effects are 
blocks in experimental or observational studies that are replicated across sites 
(locations or environments) or over time. Random effects also encompass variations 
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among individuals (when measuring multiple responses per individual such as 
survival of multiple offspring or sex ratios of multiple offspring), genotypes, species, 
and regions or periods over time. 

GLMMs аге a powerful class of statistical tools that combine the concepts and 
ideas of generalized linear models (GLMs) with linear mixed models (LMMs). That 
is, a GLMM is an extension of the GLM, in which the linear predictor contains 
random effects in addition to fixed effects. These models handle a wide range of both 
response distributions and scenarios in which observations are sampled. GLMMs 
extend the theory of LMMs to response variables that have a non-normal distribu- 
tion. In GLMMs, the response data are not transformed; instead, the explanatory 
variables are expressed as a linear relationship through a function g of the expecta- 
tion of | b; that is, the response is conditional on random effects. This performs the 
link function that relates the response to the explanatory variables in a linear manner, 
thus allowing the use of standard LMM techniques for estimation and hypothesis 
testing. 

A conditional model is used to describe a GLMM with non-Gaussian errors 
(Model 4.1), given a link function (g), as shown below: 


g(n) = ХВ + Zb, 
which is a function of the conditional expectation given by 
ЕУМ-в '(X6+Zb)=g (т) =p (4.2) 
where g Ը) is the inverse link and the other terms have already been mentioned 


earlier. The fixed and random effects are combined to form the conditional linear 
predictor 


n= g(Ely|b]) — XB + Zb (4.3) 


The relationship between the linear predictor and the vector of observations is 
modeled as follows: 


y |b ~ (610). R) (4.4) 


The above notation (4.4) expresses the conditional distribution of y, given b has a 
mean g (լ) and variance R. Note that instead of specifying the distribution for y as 
in the case of a GLM, we specify a distribution for the conditional response y | b. 

The variance and covariance matrix for the observations is given by: 


V(y) = Е У] + УЕ) =A "PRA"? + ZGZ' (4.5) 


where matrix A is a diagonal matrix containing the variance functions of the model. 
GLMMs cover an important group of statistical models, such as: 
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(a) Linear models (LMs): absence of random effects, identity link function and the 
assumption of a normal distribution. 

(b) Generalized linear models (GLMSs): random effects are absent, link function is 
different from the identity function, and the response variables are non-normally 
distributed. 

(c) Linear mixed models (LMMs): presence of random effects, identity link function 
and normal distribution assumed for the response variable. 


GLMMs have been formulated to correct the shortcomings of LMMs, as there are 
many cases where the assumptions made in linear mixed models are inadequate. 
First, an LMM assumes that the relationship between the mean of the dependent 
variable (y) and fixed and random effects (f, b) can be modeled through a linear 
function. This assumption is questionable, like when a researcher wishes to model 
the incidence of a disease or the success or failure of an event. 

The second assumption of an LMM is that variance is not a function of the mean 
and that the random effects follow a normal distribution. The assumption of constant 
variance is not met when the response variable is binary (1, O). In this case, the 
variance is z(1 — z), which is a function of the mean. The result is a random variable, 
which can take two values (0, 1); in contrast, the normal distribution can take any 
real number. Finally, the predictions for an LMM can take any real value, whereas 
the predictions for a binary variable are bounded in the interval (0, 1), since it is a 
probability and this prediction cannot support negative values. 

Historically, a number of options have been used to address and solve some 
LMM problems, even though their use is not the most appropriate. These include 
applying logarithmic transformations (log( y)), transformations using the square root 
(4/7), arcsine transformations (seno- I y)), and so on. However, many of these 
transformations use linear mixed models by ignoring the fact that these models are 
not the most accurate, despite being aware that the response variable does not satisfy 
the assumption of normality. These options are attractive because they are relatively 
simple and easy to implement using the LMM machinery. However, they circum- 
vent the problem that a linear mixed model is not the best model for analyzing data. 


4.4 The Inverse Link Function 


Ina GLMM, the canonical link function maps the original data to the linear predictor 
of the model g() = ХВ + Zb. This linear predictor can be transformed to an observed 
data scale through an inverse link function. In other words, the inverse link function 
is used to map the value of the linear predictor for the ith observation to the 
conditional mean at the data scale y;. For example, suppose that we are conducting 
an experiment in which we are assessing the number of undesirable weeds observed 
in a crop of interest after the application of a certain number of treatments; the 
response variable is assumed to have a Poisson distribution with a mean 4;; the linear 
predictor of which is given by 


ՄԱ 
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nj —n-t т + bj 


where q is the intercept, z; is the fixed effect due to treatments, and b; is the random 
effect assuming b; ~ N (0, о). 
То obtain the inverse function of the following predictor 


log (Aj) = g(n;) =n + ti + bj, 


we proceed by exponentiating both sides of the previous equation, with which we 
obtain the inverse function of the link shown below: 


— РИ 
Aij =e" unc 


which is denoted as g  (g()) = g (n + v; + bj). 
Therefore, for this example, 4;; depends on the linear predictor through the inverse 
link function and the variance оў depends on 4;; through the variance function. 


45 The Variance Function 


The variance function is used to model the inconsistent variability of the phenom- 
enon under study. With GLMMs, the residual variability arises from two sources, 
namely, the variability of the distribution of sampling units in an experimental 
arrangement (blocks, plots, locations, etc.) and the variability due to overdispersion. 
Overdispersion can be modeled in several ways. When dealing with a GLMM, the 
scale parameter or the dispersion parameter ф is extremely important since it can 
either increase or decrease the variance in the model for each observation. 


Var(y;|b;) = Var (1) 


If overdispersion exists, one way to remove it is to add the random effects (in SAS 
_residual_) of each observation to the linear predictor. Another alternative is to use 
another distribution to model the dataset; for example, the two-parameter negative 
binomial (NB) distribution (7j; ф) instead of the single-parameter Poisson distribu- 
tion (4) in the case of count data. 


4.6 Specification of a GLMM 


A GLMM is composed of three parts: (1) fixed effects that convey systematic and 
structural differences in responses; (2) random effects that convey stochastic differ- 
ences between blocks or other random factors, as these effects allow generalizations 
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Table 4.1 Common distributions with their respective link functions 


Syntax of the link 
Distribution Link function Distribution syntax function 
Binomial Logit or probit | dist = binomiallbinlb link = logit or probit 
Poisson Log dist = PoissonlPoi link = log 
Beta Logit dist = beta link = logit 
Normal Identity dist = normallgausian link = identitylid 
Negative Log dist = negbinomiallnegbin| | link = log 
binomial nb 
Multinomial Cumulative dist = multinomiallmulti link = cumlogitlclogit 

logit 


of the population from which the sampling units have been (randomly) sampled; and 
(3) distribution of errors. Thus, a complete definition of a GLMM is as follows: 


y | b ~ f(m, ¢)(conditional distribution) 
b — N(0, С) (random effects) 
g(#) =n(link function) 
1 — Xf 4- Zb(linear predictor) 


where the distribution function /(-) is a member of the exponential family, (и) is the 
linear function, X and Z are the design matrices, and В and b are the unknown 
parameters for fixed and random effects, respectively. 

When fitting a GLMM, the data remain on the original measurement scale (data 
scale). However, when means are estimated from a linear function of the explanatory 
variables (the predictor), these means are on the model scale. A link function is used 
to link the model scale back to the original data scale. This is not the same as 
transforming the original measurements to a different measurement scale. For 
example, applying the log transformation for counts followed by an analysis of 
variance (ANOV A) under a normal distribution is not the same as fitting a general- 
ized linear model, assuming a Poisson distribution and using a log link (Gbur et al. 
2012). In the first case, the least squares means would normally be equal to the 
arithmetic means, whereas in the second case, the means are inversely linked to the 
data scale, which may not be equal to the arithmetic means of the original sample. 

The distribution specifications in “ргос GLIMMIX" have default link functions, 
but it is always highly recommended to explicitly code the link function, since for 
some type of response variable, more than one alternative exists. This way, there is 
no doubt that an appropriate function was used. Using the wrong link function will 
lead to totally meaningless and incorrect results. Table 4.1 shows some common 
distributions, the appropriate link function, and the proper syntax for each. 

For a complete list, see the online Statistical Analysis Software (SAS/STAT) 
documentation for PROC GLIMMIX. 
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47 Estimation of the Dispersion Parameter 


The overall measures of fit compare the observed values of the response variable 
with fitted (predicted) values. The dispersion parameter is unknown and therefore 
must be estimated. There are two methods for estimating the overdispersion param- 
eter. McCullagh (1983) proposed estimating overdispersion as follows: 


(у— uy V, (y — И) _ Pearson's 72 


qe М-р М-р 


where V, ! is the diagonal matrix of the variance functions and N — p is the degree 


of freedom for lack of fit. Later, McCullagh and Nelder (1989) suggested using 
deviance 


Deviance _ — 2[In(LM1) — In(L(M2))| 


p= М-р М-р 


Deviance is а global fit statistic that also compares fitted апа observed values; 
however, its exact function depends on the likelihood function of the random 
component of the model. Deviance compares the maximum value of the likelihood 
function of a model, like Му, with the maximum possible value of the likelihood 
function that is calculated using data. When data are used in the likelihood function, 
the model is saturated and has as many parameters as possible. Thus, М» is saturated 
and has as many parameters as the data. Model М» tries to fit the data and gives the 
highest possible value for the likelihood. 

If the overdispersion parameter is significantly greater than one, this indicates that 
overdispersion exists; in other words, it indicates that the variance is greater than the 
mean. Therefore, the parameter should be used to adjust the variance. If 
overdispersion is not taken into account, inflated test statistics may be generated. 
However, when the dispersion parameter is less than 1, the test statistics are more 
conservative, which is not considered a big problem. 

The following example is intended to show how GLIMMIX in SAS estimates the 
dispersion parameter in a GLMM. 


Example An agronomist wants to test the effectiveness of a new herbicide offered 
on the market (we will denote this as herb_N) and compare it with the herbicide that 
has been used for several cycles (herb_C). The experimental arrangement used was a 
randomized complete block design as shown below (Table 4.2). 


The components of a GLMM with a Poisson response variable are listed below: 


Distribution: y; | b; ~ Poisson (dj) 


bj = мо, ë yel 
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Table 4.2 Number of unde- Block Herb C Herb N 
sirable weeds per plot 7 1 | 46 

2 5 109 

3 21 30 

4 7 48 

5 2 3 

6 6 

7 0 5 

8 19 26 


Linear predictor: 7; = т] + herbicide; + b; 


Link function: log (Aij) = 


This model assumes that the slopes are the same for each herbicide. The following 
SAS code is used for the proposed model: 


proc glimmix nobound method=laplace; 
class block trt; 

model count = trt/dist=poisson link=log; 
random block; 

lsmeans trt/ilink lines; 

run; 


Explanation The “method = "option is used to specify the method used to opti- 
mize the logarithm of the likelihood function. In “proc GLIMMIX,” there аге two 
popular methods: adaptive quadrature (quad) or Laplace (laplace), which are the 
preferred methods for categorical response variables. Both of these methods fit a 
conditional model. When the quadrature method is used (method = quad), subjects 
(individuals) must be declared in the random effects (e.g., for the above program, 
"random intercept/subject=block’’). In addition, processing random effects by sub- 
ject is more efficient than using the syntax “random block” random effects in blocks. 
The “dist” option is where you specify the probability distribution that is appropriate 
for the type of response; in this case, it is the Poisson distribution. The “link” option 
is for specifying the link function of the distribution. The “ddfm” option is omitted 
so that GLIMMIX uses — by default — the method for calculating the denominator 
degrees of freedom for the fixed effects tests that result from the model. The “ilink” 
option converts the estimates of the treatment means (Ismeans) on the model scale to 
the data scale. Finally, “proc GLIMMIX” supports the “lines” option, which adds 
letter groups to the mean differences resulting from using “Ismeans.” 


The most relevant parts of the SAS output, for the purposes of what we want to 
show, are shown in Tables 4.3 and 4.4. The fit statistics of the fitted model are shown 
in part (a) and part (b) of Table 4.3. The —2 log likelihood statistic is extremely 
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Table 4.3 Fit statistics and 


| (a) Fit statistics (Akaike’s information criterion (AIC), a small 
variance components 


sample bias corrected Akaike’s information criterion 
(AICC), Bozdogan Akaike’s information criterion (CAIC), 
Schwarz’s Bayesian information criterion (BIC), Hannan and 
Quinn information criterion (HQIC)) 


—2 Log likelihood 175.35 
AIC (smaller is better) 181.35 
AICC (smaller is better) 183.35 
BIC (smaller is better) 181.59 
CAIC (smaller is better) 184.59 
HQIC (smaller is better) 179.74 


(b) Fit statistics for conditional distribution 


—2 Log L (count | r. effects) 139.03 
Pearson’s chi-square 77.56 
Pearson’s chi-square/degree of freedom (DF) 4.85 
(c) Covariance parameter estimates 
Cov Parm Estimate Standard error 
Block 1.5590 0.8690 
Table 4.4 Type III fixed effects tests and estimated least squares means 
(a) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr>F 
Herbicide 1 7 101.34 <0.0001 
(b) Trts least squares means 
Trts Estimate | Standard error | DF | ғуаше | Pr> й | Mean Standard error mean 
Herb_C | 1.4604 0.4696 7 3.11 0.0171 4.3076 | 2.0227 
Herb_N | 2.8947 0.4561 7 6.35 0.0004 | 18.0778 | 8.2447 


useful for comparing nested models, whereas the different versions of information 
criteria that exist, such as Akaike information criterion (AIC), Akaike’s information 
criteria with small sample bias correction (AICC), Bayesian information criterion 
(BIC), Bozdogan Akaike’s information criteria (CAIC), and Hannan and Quinn 
information criteria (HQIC), are useful when comparing models that are not neces- 
sarily nested (subsection (a)). The table of fit statistics for the conditional distribution 
shows the sum of the independent contributions to the conditional (part (b)) —2 log 
likelihood, the value of which is 139.03, whereas the value of Pearson’s statistic 
divided by the degrees of freedom for the conditional distribution (Pearson s chi — 
square/DF) is 4.85. 

The estimated dispersion parameter (ф = Pearson’s chi-square/DF) has a value 
far from 1; in this case, it is ф = 4.85, which indicates that there is а strong 
overdispersion. This may be because the specified distribution of the data is not 
appropriate, the counts are too small, or the variance function was not correctly 
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specified. The estimate of the variance component due to a block is tabulated in part 
(c) of Table 4.2, the estimated value of which is m = 1.559. 


The fixed effects test and least squares means are shown in Table 4.4. The type III 
fixed effects tests indicate that there is a highly significant difference (part (a)) in the 
effectiveness of herbicides in weed suppression; the estimated means with their respec- 
tive standard errors are tabulated under the “Mean” column (part (b)). The “Estimate” 
column containing the estimates of the means of Ismeans is on the model scale. They 
are derived from the log likelihood function. SAS always lists the means obtained with 
Ismeans from the model scale when creating least squares means test tables. The 
“Mean” column has been converted back to the data scale using the “ilink” inverse 
link function. These values are estimates of the average counts for each treatment level 
(in this case, the herbicide type on the data scale). When we report the results, we must 
replace the corresponding model’s least squares values in the test tables with these 
estimates (means on the data scale corresponding to the values in the “Mean” column). 


Since there is a strong overdispersion ($ > iy assuming that the data have a 


Poisson distribution is risky because this implies that the mean and variance are 
equal, which is an assumption implying that the data have a Poisson distribution, 1.е., 
that the mean and variance are the same. A useful alternative distribution might be a 
negative binomial distribution; this distribution has a mean 4 and variance 4 + Аф” 
with ф > 0 commonly known as the scale parameter. 

The following is the specification of the components of a GLMM with a negative 
binomial (NB) response variable: 


Distribution : y; | b; ~ Negative binomial (4;, $) 
bj а N(0, Өзімше) 


Linear predictor: n; =n + herbicide; + b; 


Link function: log (Ai) = 


Тһе GLIMMIX procedure also allows modeling а GLMM with a negative 
binomial response variable: 


proc glimmix data=itam nobound method=laplace; 
class block trts; 

model count = trts/dist=negbin; 

random block; 

lsmeans trts/ilink; 

run; 


Part of the output is shown in Table 4.5. The fit statistics for the model compar- 
ison (part (а)) and that for the conditional distribution (part (b)) аге both provided by 
the GLIMMIX procedure when а conditional distribution is specified. Since in the 
previous analysis, it was observed that overdispersion exists when assuming a 
Poisson distribution, the results — under a negative binomial distribution — indicate 
that this overdispersion problem no longer exists; i.e., the binomial distribution is no 
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Table 4.5 Еш statistics (a) Fit statistics 
under a negative binomial —2 Log likelihood 


distribution 
AIC (smaller is better) 
AICC (smaller is better) 
BIC (smaller is better) 
CAIC (smaller is better) 
HQIC (smaller is better) 


(b) Fit statistics for conditional distribution 


120.50 
128.50 
132.13 
128.81 
132.81 
126.35 


—2 Log L (count | r. effects) 120.50 
Pearson’s chi-square 9.21 
Pearson’s chi-square/DF 0.58 


(c) Fit statistics and Pearson’s chi-square/DF 


Poisson Negative binomial 

—2 Log likelihood 175.35 120.50 
AIC (smaller is better) 181.35 128.50 
AICC (smaller is better) 183.35 132.13 
BIC (smaller is better) 181.59 128.81 
CAIC (smaller is better) 184.59 132.81 
HQIC (smaller is better) 179.74 126.35 
$ (Pearson’s chi-square/DF) 4.85 0.58 


longer overdispersed ($ = 0.58) . In other words, Ше negative binomial distribution 


does a better Job than the Poisson distribution in fitting these data, since it effectively 
controls the overdispersion. 

Comparing the fit statistics tabulated in Table 4.3 subsection (c) under both 
distributions, we can observe that when the data are modeled under a negative 
binomial distribution, the values of the fit statistics are lower than those under a 
Poisson distribution, since the dispersion parameter ф < 1. This indicates that the 
negative binomial models this dataset better. 


48 Estimation and Inference in Generalized 
Linear Mixed Models 


4.8.1 Estimation 


Іп GLMMs, inference involves the estimation and testing of the hypotheses of 
unknown parameters in Й, б, and R as well as the best linear unbiased predictions 
(BLUPs) of random effects, b. In most modern statistical tools, including GLMMs, 
parameter fitting is performed via maximum likelihood (ML) or methods derived 
from this method. For simple analyses, in which the response variables are normal, 
classical ANOVA methods are based on calculating the differences of the sums of 
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the squares that produce the same results as an ML estimation. However, this 
equivalence is not obtained in models with more complex structures such as 
LMMs or GLMMs. То find the ML estimators, in GLMMs, one must integrate 
over all possible values of the random effects. For GLMMs, this computation is at 
best slow and at worst (a large number of random effects) computationally 
infeasible. 

Statisticians have proposed several ways to approximate the parameter estimates 
of a GLMM, including penalized quasi-likelihood (РОГ) and pseudo-likelihood 
methods (Schall 1991; Wolfinger and O'Connell 1993; Breslow and Clayton 
1993), Laplace approximations (Raudenbush et al. 2000) and Gauss-Hermite quad- 
rature (Pinheiro and Chao (2006), and Bayesian methods based on Markov chain 
Monte Carlo (Gilks et al. 1996). In all these approaches, researchers must distinguish 
between a standard ML estimation, which estimates the standard deviations of the 
random effects assuming that the fixed effects estimates are precisely correct, and 
restricted maximum likelihood (REML), a variant that averages over the uncertainty 
in the fixed effects parameters (Pinheiro and Bates 2000; Littell et al. 2006). 

The ML method underestimates the standard deviations of random effects, except 
in extremely large datasets, but it is most useful for comparing models with different 
fixed effects. Pseudo- and quasi-likelihood methods are the simplest and the most 
widely used in approximating a GLMM. They are widely implemented in statistical 
packages that promote the use of GLMMs in many areas of ecology, biology, and 
quantitative and evolutionary genetics (Breslow 2004). Unfortunately, pseudo- and 
quasi-likelihood methods produce biases in parameter estimation if the standard 
deviations of the random effects are large, especially when using binary data 
(Rodriguez and Goldman 2001; Goldstein and Rasbash 1996). Lee and Nelder 
(2001) have implemented several improvements to the РОГ, version, but these are 
not available in most common statistical software packages. As a rule of thumb, POL 
performs poorly for Poisson data when the average number of counts per treatment 
combination is less than five or for binomial data when the expected numbers of 
successes and failures for each observation are less than five (Breslow 2004). 
Another disadvantage of PQL is that it calculates a quasi-likelihood rather than the 
true likelihood. Because of this, many statisticians believe that POL-based methods 
should not be used for inference. 

There are two more accurate approximations available, which also reduce bias. 
One is the Laplace approximation (Raudenbush et al. 2000), which approximates the 
true likelihood of a GLMM instead of a quasi-likelihood, allowing the maximum 
likelihood method in the GLMM inference process. The other approach is called 
Gauss-Hermite quadrature (Pinheiro and Chao 2006), which is more accurate than 
the Laplace approximation but is slower (requires more computational resources). 
Therefore, the procedures for parameter estimation of a GLMM that are approxima- 
tions are as follows: 


The penalized quasi-likelihood method performs the estimation process by alternat- 
ing between (1) estimating the fixed parameters by fitting a GLM with a variance— 
covariance matrix based on an LMM fit and (2) estimating the variances and 
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covariances by fitting a GLM with unequal variances calculated from the previ- 
ous GLM fit. Pseudo-likelihood, a close cousin of the ML method, estimates 
variances differently and estimates a scale parameter to account for 
overdispersion (some authors use these terms interchangeably). In summary, 
GLMMs require an iterative process in parameter estimation. Two categories of 
iterative procedures are used by SAS: linearization and integral approximation. 
The GLIMMIX procedure uses the pseudo-likelihood method in linearization, 
and integral approximation uses the Laplace approximation or adaptive methods 
such as Gauss-Hermite quadrature. These methods maximize the log likelihood 
of the exponential distribution family, 1.е., non-normal distributions. The pseudo- 
likelihood method is the default procedure in the GLIMMIX procedure (Proc 
GLIMMIX). The Laplace method and quadrature are an approximation for 
maximum likelihood, but the Laplace method is computationally simpler than 
quadrature and also provides excellent estimates. 


4.8.2 Inference 


After estimating the parameter values in a GLMM, the next step is to extract 
information and draw statistical conclusions from a given dataset through careful 
analysis of the parameter estimates (confidence intervals, hypothesis testing) and 
select a model that best describes or explains the most variability in the dataset. 
Inference can generally be based on three types: (a) hypothesis testing, (b) model 
comparison, and (c) Bayesian approaches. Hypothesis testing compares test statistics 
(F-test in ANOVA) to verify their expected distributions under the null hypothesis 
(Ho), estimating the value of P (P-value) to determine whether Не can be rejected. 
On the other hand, model selection compares candidate model fits. These can be 
selected using hypothesis testing; that is, testing nested versus more complex models 
(Stephens et al. 2005) or using information theory approaches such as Wald tests (Z, 
x. t, and F). In model selection, likelihood ratio (LR) tests can ensure the signifi- 
cance of factors or choose the best of a pair of candidate models. On the other hand, 
information criteria allow multiple comparisons and selections of non-nested 
models. Among these criteria are the Akaike information criterion (AIC) and related 
information criteria that use deviance as a measure of fit, adding a term to penalize 
more complex models. Information criteria can provide better estimates. Variations 
of AIC are highly common when sample sizes are not large (AICC), when there is 
overdispersion in the data (quasi-AIC, QAIC), or when one wishes to identify/ 
determine the number of parameters in a model (Bayesian information criterion, 
BIC). 
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4.9 Fitting the Model 


The mathematics behind a GLMM is quite complex. It is difficult to conceptualize 
the use of constructs such as distributions, link functions, log likelihood, and quasi- 
likelihood when fitting a model. Perhaps the following points will help explain the 
modeling process. 


(a) An analysis of variance model is a vector of linear predictors (equation) with 
unknown parameter estimates. 

(b) Each distribution has a corresponding probability function. 

(c) The vector of linear predictors is substituted into the likelihood function. 

(d) Solutions to the parameter estimates are found by minimizing the negative of the 
log likelihood function (—log likelihood). 

(e) The means (least squares means — Ismeans) are derived from the parameter 
estimates and are on the model scale. 

(f) The link function converts the mean estimates at the model scale to the original 
data scale. 


The key concepts of proc GLIMMIX are (1) it uses a distribution to estimate the 
model parameters; it does not fit the data to a distribution, and (2) the data values are 
not transformed by the link function; the link function converts the means (least 
squares means) to the data scale after estimation at the model scale. 


4.10 Exercises 


1. As a simple example of these types of data, consider the following results of an 
experiment on wheat germination, carried out in pots under glass. The experiment 
consisted of four blocks of six treatments (Table 4.6). 


(a) According to the response variable, what type(s) of probability distribution do 
you suggest for the variable? 

(b) Construct a GLMM to study the effect of treatments on seed germination. 

(c) Analyze the dataset according to the model proposed in (a). Is the probability 
distribution proposed in (a) adequate? 

(d) Is there a significant difference in the proportion of germinated seeds between 
treatments? 


Table 4.6 Number of seeds 
not germinating (out of 50) 
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Table 4.7 Control of cockchafer larvae 


A B C D E F G H 

а |В а a |b а Ба |b a |b 
Trl 3 7 qu 0 13 Ն [7 |1 10 4 |13 
Trt2 |4 3 |3 2 3 |1 |6 3 5 |4 |11 
Тиз 3 10 6 21211711 8 7 |10 
Тм |5 8 4 2 |7 13 |7 |0 3 |3 12 
Ти 4 6 4 01054 |I 6 |1 8 


2. Table 4.7 shows Ше counts рег sample area of a variety type of cockchafer larva 
(two age groups а and b). The experiment consisted of five treatments in eight 
randomized blocks and two age groups to study the differential effects of 
treatments on insect age. 


(a) Considering the type of answer of this exercise; what type(s) of probability 
distribution(s) do you suggest for this type of response? 

(b) Construct a GLMM to study the effect of treatments and the age of Cock- 
chafer larvae. 

(c) Analyze the dataset according to the model proposed in (a). 

(d) Is the model used in (a) sufficient? If so, discuss your findings. 
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Chapter 5 A 
Generalized Linear Mixed Models се 
for Counts 


5.1 Introduction 


Data in the for of counts regularly appear in studies in which the number of 
occurrences is investigated, such as the number of insects, birds, or weeds in 
agricultural or agroecological studies; the number of plants transformed or 
regenerated using modern breeding techniques; the number of individuals with a 
certain disease in a medical study; and the number of defective products in a quality 
improvement study, among others. These counts can be counted per unit of time, 
area, or volume. When using a generalized linear model (GLM) with a Poisson 
distribution, it is often found that there is excessive dispersion (extra variation) that is 
no longer captured by the Poisson model. In these cases, the data must be modeled 
with a negative binomial distribution that has the same mean as the Poisson 
distribution but with a variance greater than the mean. Most experiments have 
some form of structure due to the experimental design (completely randomized 
design (CRD), randomized complete block design (RCBD), incomplete block, or 
split-plot design) or the sampling design, which must be incorporated into the 
predictor to adequately model the data. 


5.2 The Poisson Model 


A Poisson distribution with parameter 4 belongs to the exponential family and is a 
discrete random variable, whose probability function is equal to 


e 


О) = T ՏԱՅ d dee 
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The mean and variance of a Poisson random variable are equal, i.e., Е(у) = Var 
(y) = 4. A Poisson distribution is often used to model responses that are “counts.” As 
A increases, the Poisson distribution becomes more symmetric and eventually it can 
be reasonably approximated by a normal distribution. 

Let yj; be the value of the count variable associated with unit i at level one and 
with unit j at level two, given a set of explanatory variables. Therefore, we can 
express this as 


and the logarithm of the likelihood is given by: 


ա 


log (уу) = log e = — Aj + yy log (åy) — log(v;!). 


i: 


A Poisson distribution has very particular mathematical properties that are used 
when we model “counts.” For example, the expected value of y is equal to the 
variance of y, such that 


E(yi) = Var(ya) = Ay 


Then, 4j; is necessarily a nonnegative number, which could lead to difficulties if 
we consider using the identity bound function in this context. The natural logarithm 
is mainly used as a link function for expected "counts." For single-level (factor) data, 
Poisson regression model is considered, where we work with the natural logarithm of 
the counts, log(4), whereas for multilevel data (more than two factors), mixed 
models with Poisson data are considered a better choice for the logarithm of the 
counts 7. 

Suppose that given the random effects of b, the counts y1, y», ``", y, are condi- 
tionally independent such that уу | bj-Poisson(4;), where 


log (Ai) =q + z; + bi. 


This is a special case of a generalized linear mixed model (GLMM) in which the 
link function of this family of distributions is g(Aj) = log (4j). The dispersion 
parameter Փ, in this case, is equal to 1. 

Sometimes, if the data counts are extremely large, their distribution can be 
approximated to a continuous distribution. Whereas, if all the counts are large 
enough, then the square root of the counts is viable for fitting the model as it allows 
the variance to be stabilized. However, as mentioned in previous chapters, the 
estimation process under normality can be problematic, as it can provide negative 
fitted values and predictions, which is illogical. 
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5.2.1 СЕР with а Poisson Response 


An CRD is a design in which a fixed number of t treatments is randomly assigned to 
r experimental units. The linear predictor describing the mean structure of this GLM 
is 


n = 1 + Ti 


where 7; denotes the ijth link function of the ith treatment in the jth observation, y is 
the intercept, and т; is the fixed effect due to treatment i (i = 1,2, °°, t, j = 1,2,-%ғ;)), 
with t treatments апа r; replicates in each treatment i. 


Example Effect of a subculture on the number of shoots during micropropagation 
of sugarcane. 


The objective of micropropagation in sugarcane is to produce vegetative material 
identical to the donor so that its genetic integrity is preserved. Despite this, 
somaclonal variation has been observed in plants derived from in vitro culture 
regardless of explant, variety, ploidy level, number of subcultures, and generation 
route used, among others. A total of 8 explants were planted in temporary immersion 
bioreactors (explant/bioreactor) to determine whether the number of subcultures 
(10 subcultures) influences the number of shoots observed per explant. In this 
example, we have r; observations (j = 1,2,...,7;) on each of the 10 subcultures 
(i = 1,2,..., 10) in a completely randomized design (Appendix 1: Data: Subcul- 
tures). The analysis of variance (ANOVA) table (Table 5.1) for this model is given 
below: 

The components of the GLM are set out below: 


Distribution: y; ~ Poisson (Ai) 
Linear predictor: n; =] + т; 
Link function: log (Aj) = 1 
where у; denotes the number of sprouts observed in subculture i explant j (i = 1, 2, 


10: = 12," 8), n is the ijth link function, 7 is the intercept, and т, is the fixed 
effect of subculture i. 


Table 5.1 Analysis of variance 


Sources of variation Degrees of freedom 

Unbalanced design Balanced design 
Subculture t—-1=10-1=9 t—1 
Error Y iri t=164 tr = 1) 


Total O2 iri —1= 173 Ir — 1 
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Table5.2 Model information 


Ul: (a) Model information 
and estimation methods 


Dataset WORK.SUGAR 
Response variable NB 

Response distribution Poisson 

Link function Log 

Variance function Default 

Variance matrix Diagonal 

Estimation technique Maximum likelihood 
Degrees of freedom method Residual 


(b) Dimensions 


Columns in X 


Columns in Z 0 
Subjects (blocks in V) 1 
Мах Obs per subject 


The following Statistical Analysis Software (SAS) code allows analyzing an 
CRD with a Poisson response. 


ркос glimmix data=sugar method=laplace; 
class rep1 sub1 ; 

model nb=sub/dist=poisson s link=log; 
lsmeans sub/lines ilink; 

run;quit; 


While most of the commands used have been explained before, the options in the 
model statement “dist,” "s," and “link” communicate to the SAS the type of data 
distribution, the fixed effects solution, and the link to use, respectively. In addition, 
the “lines” option asks the GLIMMIX procedure in the “Ismeans” (least squares 
means) command for mean comparisons, and the “ilink” option provides the inverse 
link function. 

Part of the output is shown in Table 5.2, where part (a) shows the model and the 
methods used to fit the statistical model, whereas part (b) lists the dimensions of the 
relevant matrices in the model specification. 

Due to the absence of random effects in this model, there are no columns in 
matrix Z. The 11 columns in matrix X comprise an intercept and 10 columns for the 
effect of subcultures. 

The goodness-of-fit statistics of the model are shown in part (a) of Table 5.3. The 
value of the generalized chi-squared statistic over its degrees of freedom (DFs) is 
less than 1. (Pearson s chi — square/DF = 0.79). This indicates that there is no 
overdispersion and that the variability in the data has been adequately modeled with 
the Poisson distribution. 

Subsection (b) of Table 5.3 shows the maximum likelihood (ML) (“Estimate”), 
parameter estimates, standard errors, and t-tests for the hypothesis of the parameters. 
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Table 5.3 Fit statistics and estimated parameters 


(a) Fit statistics (Akaike’s information criterion (AIC), a small sample bias Corrected Akaike’s 
information criterion (AICC), Bozdogan Akaike’s information criteria (CAIC), Schwarz’s 
Bayesian information criterion (BIC), Hannan and Quinn information criterion (HQIC)) 


—2 Log likelihood 1062.11 

Akaike information criterion (AIC) (smaller is better) 1082.11 

AICC (smaller is better) 1083.46 

Bayesian information criterion (BIC) (smaller is better) 1113.70 

CAIC (smaller is better) 1123.70 

HQIC (smaller is better) 1094.93 

Pearson’s chi-square 137.70 

Pearson’s chi-square/DF 0.79 

(b) Parameter estimates 

Effect subl Estimate Standard error DF t-value Pr > й 
Intercept ñ 3.6687 0.04124 164 88.96 <0.0001 
subl 1 էլ —1.0809 0.07389 164 - 14.63 <0.0001 
subl 2 15 —0.9043 0.06664 164 —13.57 «0.0001 
subl 3 73 —0.5596 0.06839 164 —8.18 <0.0001 
subl 4 74 —0.3412 0.06398 164 —5.33 «0.0001 
subl 5 75 0.2177 0.05540 164 3.93 0.0001 
subl 6 76 0.2257 0.05452 164 4.14 <0.0001 
subl 7 $4 0.2631 0.05178 164 5.08 «0.0001 
subl 8 $4 0.3387 0.05109 164 6.63 «0.0001 
subl 9 7% 0.2684 0.05478 164 4.90 <0.0001 
subl 10 T10 0 


Table 5.4 (part (a)) shows significance tests for the fixed effects in the model 
“Type Ш fixed effects tests.” These tests are Wald tests and not likelihood ratio tests. 
The effect of a subculture on the number of shoots is highly significant in this model 
with a value of P < 0.0001, indicating that the 10 subcultures do not produce the 
same number of shoots, that is, the number of subcultures affects the average shoot 
production in the explant. 

The least squares means obtained with “Ismeans” (part (b) in Table 5.4) are the 
values under the column "Estimate," which along with the standard errors, were 
calculated with the linear predictor 77; =? + 7). These estimates are on the model 
scale, whereas the “Mean” column values and their respective standard errors are on 
the data scale, which were obtained by applying the inverse link to obtain the 2; 


values, i.e., pi — exp (а) with their respective standard errors. 

A comparison of means, using the option “lines,” is presented in Fig. 5.1. In this 
figure, we can see that in the first subcultures, the average production is minimal but 
it increases as subcultures increase from 5 to 8, and, in subculture 9, the average 
number of shoots per explant begins to decrease. 
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Table 5.4 Type Ш tests of fixed effects and least squares means (means) 


(a) Type III tests of fixed effects 


Effect Num DF Den DF F-value Pr> F 
subl 9 164 120.14 «0.0001 
(b) sub! least squares means 
subl |Estimate | Standard error | DF |r-value Pr > Id Mean Standard error mean 
1 2.5878 0.06131 164 | 42.21 | «0.0001 |13.3000 |0.8155 
2 2.7644 0.05234 164 | 52.81 |«0.0001 | 15.8696 | 0.8307 
3 3.1091 0.05455 164 | 56.99 | <0.0001 | 22.4000 | 1.2220 
4 3.3274 0.04891 164 | 68.03 | «0.0001 | 27.8667 | 1.3630 
5 3.8864 0.03699 164 | 105.08 | <0.0001 | 48.7333 | 1.8025 
6 3.8944 0.03567 164 | 109.18 | «0.0001 | 49.1250 | 1.7522 
7 3.9318 0.03131 164 | 125.57 | <0.0001 | 51.0000 | 1.5969 
8 4.0073 0.03015 164 | 132.91 | <0.0001 | 55.0000 | 1.6583 
9 3.9370 0.03606 164 | 109.18 | «0.0001 | 51.2667 | 1.8487 
10 3.6687 0.04124 164 | 88.96 | <0.0001 | 39.2000 | 1.6166 
й errorstd (m) 2 EITOT stg (i) 


Average of shoots per explant 


1 2 3 4 5 6 7 8 9 10 


Subcultures 


Fig. 5.1 Average number of shoots per subculture. Bars with different letters are statistically 
different using a — 0.05 


5.2.2 Example 2: CRDs with Poisson Response 


Researchers want to determine whether the application of a new growth compound 
to walnut trees changes the amount of nuts produced per tree. They were applied at 
three different times (pre-flowering — 1, flowering — 2, and post-flowering — 3) and 
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Table 5.5 Number of nuts Trt 
per tree (уу) in each of the 
combinations of the two 
factors 


Vij Trt Vij Trt Vij Trt Vij Trt Vij 


ВІ |89 B3 118 |B3 99 Al |21 C 11 
BI |99 АЗ |99 А1 |79 B2 |118 B2 |89 
В2 |158 |С 50 ВІ |118 C 30 B3 |158 
Al |89 А2 |127 |А2 |89 B2 |99 B3 |118 


іп two formulations (A апа В) plus а control (С). In addition to the treatments (Trt) 
Шеге was а control, where по compound was applied. Іп total, 7 treatments were 
randomly applied to the experimental units (trees), 1.е., 35 trees, in a rectangular 
arrangement (as shown below). The average number of nuts y;; observed in the 
formulation and the time of application are provided in Table 5.5. 

The components of the GLMM are listed below: 


Distribution : y;; | rj ~ Poisson(4;;) 
2 
lj ША N(0, бее) 
Linear predictor: у=] + т; + rj 


Link function: log (Aij) = 


where у; denotes the number of nuts in treatment i on tree j (i = 1,2,:-,7;/ = 1,2, 
177,5), ту is the linear predictor, 7 is the intercept, т; is the fixed effect due to 
treatment i, and r; is the random effect due to tree /. 

The following SAS statements allows a GLMM to be fitted in a completely 
randomized design with a Poisson response variable. 


proc glimmix data=crd_nuez nobound method=laplace; 
class trt rep; 

model count = trt/dist=Poi link=log; 

random rep; 

lsmeans trt/lines ilink; 

run; 


The options in the model statement, dist, s and ilink communicates to SAS the 
type of data distribution, the fixed effects solution and to compute the inverse link, 
respectively. In addition, the option “lines” requests the GLIMMIX procedure in the 
"|smeans" (least squares means) command, and the mean comparisons and the 
“ilink” option provide the inverse of the link function. 

Part of the results is presented in Table 5.6. The value of the statistic for 
conditional distribution (part (a)) indicates that there is a strong overdispersion Call 
df = 3.62), and the variance component estimates due to sampling in the experi- 
mental units (trees) is 62... = 0.035 (part (b)). 


tree 
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Table 5.6 Results of the 


р à (a) Fit statistics for conditional distribution 
analysis of variance 


—2 Log L (count | r. effects) 354.60 
Pearson's chi-square 126.54 
Pearson's chi-square/DF 3.62 
(b) Covariance parameter estimates 

Cov Parm Estimate Standard error 


rep 0.03573 0.02362 


(c) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr > F 
Trt 6 24 59.55 <0.0001 


In addition, Table 5.6 (part (c)) shows the type Ш tests of fixed effects, indicating 
that there is a significant difference between treatments on the average number of 
nuts per tree (P = 0.0001). However, it is not recommended to continue with the 
inference and analysis of the experiment due to the presence of extra-variance 
(commonly known as overdispersion; Pearson s chi — square/DF = 3.62) in the 
data that strongly affects the F-test and the standard errors of the means. 

A highly effective alternative to deal with the inconvenience of overdispersion in 
the data is to use a different distribution to the Poisson distribution. A negative 
binomial distribution is an excellent option for count data with overdispersion. 
Assuming that the conditional distribution of the observations is given by: 


yj |; е Poisson(4;), 


where 4;-Gamma-(1/ó, ó), p as the scale parameter and ғ; ~ №(0, о2.). The 
resulting new GLMM is: 


Distribution : y; | г; ~ Negative Віпотіа!(4;, Փ), 
rj ~ N(0, oi.) 
Linear predictor : q; = + z; + r; 


Link function: log (Aij) =} 


Тһе following GLIMMIX statements for fitting this model under а negative 
binomial distribution in a CRD manner is provided next. 


proc glimmix data=crd_nuez nobound method=laplace; 
class trt rep; 

model count = trt/dist=Negbin link=log; 

random rep; 

lsmeans trt/lines ilink; 

run; 
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Table 5.7 Poisson and nega- (a) Fit statistics 


tive Боа) moderi Poisson Negative Binomial 

statistics 
—2 Log likelihood 374.83 328.03 
AIC (smaller is better) 390.83 346.03 
AICC (smaller is better) 396.37 353.23 
BIC (smaller is better) 387.71 342.51 
CAIC (smaller is better) 395.71 351.51 
HQIC (smaller is better) 382.45 336.60 
(b) Fit statistics for conditional distribution 
Poisson Negative Binomial 
—2 Log L (count | r. effects) 354.60 316.06 
Pearson’s chi-square 126.54 32.02 
Pearson’s chi-square/DF 3.62 0.91 


Table 5.8 Variance compo- (а) Covariance parameter estimates 


пеш estimates and fized Cov Parm Estimate Standard error 
effects tests իյֆժշծֆթ0թԵֆթծ99ՁՋ9Զ9Ձ9|1ՎվՎ|)պԼպԼԵԼպԼԼ.Ե-ծ- 2:2 
Кер 0.04288 0.03398 
Scale 0.06141 0.02428 


(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr > Е 
Trt 6 24 18.75 <0.0001 


Part of the results is listed below. The information criteria in Table 5.7 part (a) are 
helpful in choosing which model best fits the dataset. Clearly, the negative binomial 
distribution provides the best fit to these data. On the other hand, in the conditional fit 
statistics (part (b)), we observed that the Poisson model had a strong overdispersion 
(Pearson s chi — square/DF = 3.62) and that by fitting the data under a negative 
binomial distribution, the overdispersion of the dataset was removed (Pearson s chi — 
Square/DF = 0.91). 

Table 5.8 shows the variance component estimates (part (a)) and the type III tests 
of fixed effects (part (b)). The estimated variance parameter, due to trees, is 
92. — 0.04288, and the estimated scale parameter (Scale) is $ =0.06141. Тһе 
type III tests of fixed effects (part (b)) show that there is a highly significant effect 
of treatments on the average number of nuts (P < 0.0001). 

The values under the column “Estimates” are the estimates of the linear predictor 
1; (the model scale), and the values under “Mean” are the means А, (Ше data scale) 
with their respective standard errors obtained with the command “Ismeans” and 
“ilink” (Table 5.9). The results show that the treatments implemented in this 
experiment showed a higher average number of walnuts than did the "control" 
treatment C. In general, formula B applied to the walnut trees at the full-flowering 
stage showed a higher nut production. 
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Table 5.9 Estimates on the model scale (“Estimate”) and means on the data scale (“Mean”) 


Trt least squares means 


Trt | Estimate | Standard error | DF |1-уаше | Pr > Ш Mean Standard error mean 
АТ | 4.0865 0.1560 24 26.19 <0.0001 59.5307 9.2890 
А2 | 4.5624 0.1519 24 | 30.04 <0.0001 95.8162 | 14.5529 
A3 | 4.5293 0.1519 24 | 29.82 ՀՕ.0001 92.6956 | 14.0783 
ВІ | 4.4349 0.1529 24 | 29.01 ՀՕ.0001 84.3417 |12.8958 
82 | 4.7863 0.1504 24 31.82 <0.0001 | 119.86 18.0304 
B3 4.7641 0. 1504 24 31.67 ՀՕ.0001 | 117.23 17.6335 
Շ 3.0499 0.1742 24 | 1751 ՀՕ.0001 21.1140 3.6785 


Interest often arises in areas of agricultural and biological sciences to conduct 
experiments that involve random effects (blocks, locations, etc.) and response vari- 
ables different from the normal distribution. For example, suppose that a certain 
number of treatments are being tested at different randomly selected locations, out of 
a sufficiently large number of locations. At each location, the experimental units are 
randomly assigned to each of the treatments. Let y;; be the number of (observed) 
individuals possessing the characteristic of interest іп the ith treatment in the jth 
block. The model for the mean structure of this experiment is 


ng ant zi + bj 


where y is the intercept, т; is the fixed effect due to the ith treatment i, and b; is the 
random effect of the block j with b; ~ N(0, сд д). 


5.2.3 Example 3: Control of Weeds in Cereal Crops 
in an RCBD 


One of the main problems when growing cereal crops is the competition that exists 
between the weeds and seedlings. If a field supervisor is interested in testing five 
designed treatments plus a control for weed control in cereal crops, then a random- 
ized complete block design (four blocks) should be used. Table 5.10 shows the 
number of weed plants observed in each of the treatments (у;) in parentheses. 

Table 5.11 shows the sources of variation and the degrees of freedom of a 
randomized complete block design used in this experiment. 

Since the response is count, it will be modeled using a GLMM with a Poisson 
response variable, which is stated below: 


Distribution : у; | b; ~ Poisson(4;) 
bj s N(0, бірак) 
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Table 5.10 Number of weeds in each treatment (the number in parentheses corresponds to the 
treatment number) 


Block 
A (1) 438 (4) 17 (2) 538 (5) 18 (3) 77 (6) 115 
B (3) 61 (2) 422 (6) 57 (1) 442 (5) 26 (4) 31 
C (5) 77 (3) 157 (4) 87 (6) 100 (2) 377 (1) 319 
р (2) 315 (1) 380 (5)20 (3) 52 (4) 16 (6) 45 
Table 5.11 Analysis of Sources of variation Degrees of freedom 
5 Block 5-1-4-1-3 
Treatment t-1=6-1=5 
Error (t — 10 — 1)=15 
Total th — 1 = 23 


Linear predictor: 7, = + 7; + bj 


Link function: log (Aij) =} 


where y;; denotes the number of weed plants observed in treatment i and block 
JG = 1,2,°,6;] = 1,2,3,4), ту is the linear predictor, 4 is the intercept, z; is the 
fixed effect due to treatment i, and b; is the random block effect (b; ~N (0, Clock) ). 

Using Ше GLIMMIX procedure, the following syntax specifies the analysis of а 
GLMM with a Poisson response. 


proc glimmix nobound method=laplace; 
class Block Trt; 

mode1 Count = Trt/dist=Poisson s; 
random block; 

lsmeans Trt/diff lines ilink; 

run; quit; 


Note that in the above syntax, we use “method = laplace” (or we can also use 
“method = quadrature") to fit the mixed model and obtain the chi-squared/DF fit 
statistic. If the method of integration is not specified, then a generalized chi-squared/ 
DF statistic is obtained. The auxiliary options after the "Ismeans" command are 
described below: "diff" provides paired comparisons between treatments, “lines” 
provides the pair comparison of means using letters, and “ilink” provides the value 
of the inverse of the link function. Some of the outputs are listed below. 

Table 5.12 (a) presents the basic information about the model and estimation 
procedure used. 

Subsection (b) of Table 5.12 shows/ lists the “Dimensions” of the relevant 
matrices used in the model. The random effects matrix Z indicates that there are 
four columns due to blocks, and the fixed effects matrix X indicates that there is one 
column for the intercept plus six columns due to treatments. 
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Table 5.12 Basic model (a) Model information 


intonation Dataset WORK.DBCA 
Response variable Counting 
Response distribution Poisson 
Link function Log 
Variance function Default 
Variance matrix Not blocked 
Estimation technique Maximun likelihood 
Likelihood approximation Laplace 
Degrees of freedom method Containment 
(b) Dimensions 
G-side Cov. parameters 1 
Columns in X 7 
Columns in Z 4 
Subjects (blocks in V) 1 
Max Obs per subject 24 

Table 5.13 Model fit (a) Fit statistics 

մազան —2 Log likelihood 434.46 
AIC (smaller is better) 448.46 
AICC (smaller is better) 455.46 
BIC (smaller is better) 444.16 
CAIC (smaller is better) 451.16 
HQIC (smaller is better) 439.03 
(b) Fit statistics for conditional distribution 
—2 Log L (Count l r. effects) 418.66 
Pearson’s chi-square 283.09 
Pearson’s chi-square/DF (9) 11.80 


The “Fit statistics” and “Fit statistics for conditional distribution" (parts (a) and 
(b) of Table 5.13, respectively) show information about the fit of the GLMM. The 
generalized chi-squared statistic measures the sum of the residual squares in the final 
model and the relationship with its degrees of freedom; this is a measure of the 
variability of the observations about the model around the mean. 

The value of Pearson's chi-square/DF for the conditional distribution is 11.8, well 
above up 1. This value gives strong evidence of overdispersion in the dataset. In 
other words, this value is calling our distribution and linear predictor assumption into 
question, which means that the variance function was not adequately specified. 
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Table 5.14 Variance component estimates, parameter estimates, and type III tests of fixed effects 


Cov Parm Estimate Standard error 
Block (62 0.01840 0.01377 

(a) Solutions for fixed effects 

Effect Trt Estimate Standard error DF t-value Pr > Id 
Intercept Г] 4.3637 0.08808 3 49.54 <0.0001 
Trt 1 7i 1.6056 0.06155 15 26.09 «0.0001 
Trt 2 т) 1.6508 0.06132 15 26.92 <0.0001 
Trt 3 т; 0.09042 0.07769 15 1.16 0.2627 
Trt 4 74 —0.7416 0.09888 15 —7.50 <0.0001 
Trt 5 Ts —0.8101 0.1012 15 —8.00 <0.0001 
Trt 6 T6 0 

(b) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr > F 
Trt 5 15 523.57 <0.0001 


Table 5.15 Estimated least squares means (“Mean”) 


Trt least squares means 


Trt | Estimate | Standard error | ОЕ |І-уаше |Pr > Id Mean Standard error mean 
1 5.9693 0.07237 15 | 82.49 «0.0001 | 391.25 28.3139 
2 6.0145 0.07217 15 | 83.33 <0.0001 | 409.34 29.5437 
3 4.4541 0.08652 15 | 51.48 <0.0001 | 85.9802 | 7.4390 
4 3.6221 0.1060 15 | 34.19 <0.0001 | 37.4150 | 3.9643 
5 3.5536 0.1081 15 | 32.86 <0.0001 | 34.9372 | 3.7784 
6 4.3637 0.08808 15 | 49.54 <0.0001 | 78.5467 | 6.9186 
й errorsa(7];) 2 errorsta (à) 
The F-test for testing Ho (ոլ = T2 = *** = тє) or equivalent (и = ил = = Me) 


indicates that there is a highly significant difference (P < 0.0001) in the average 
number of weeds in at least one treatment (part (c)) (Table 5.14). 
The estimates of the linear predictor on the model scale for each of the treatments 


(7;) and the inverse of the linear predictor (%) оп the data scale (with their 


respective standard errors) are calculated as follows 7; =? + 7; and A = exp (%), 
respectively. These values are listed in Table 5.15. 

The “plots” option in the “proc GLIMMIX” statement creates a set of plots for the 
raw residuals, Pearson residuals, and studentized residuals. 

The panel consists of a plot of studentized residuals versus the linear predictor 
(մ), a histogram of the residuals with a normal density superimposed, a plot of 
residual versus quantiles, and a box plot for the residuals. The panel of studentized 
residuals indicates the possibility of a slightly skewed distribution (Fig. 5.2). In this 
figure, we can see that the range of values of the residuals changes, as do the values 
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Conditional Studentized Residuals 
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Percent 
8 
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Fig. 5.2 Studentized conditional residuals 


of the linear predictor, indicating that the assumption of constant variance is no 
longer met. The residuals—quantiles plot confirms the constant variance violation. A 
nonconstant variance may also suggest an incorrect selection of the response distri- 
bution or variance function. 


5.2.4 Overdispersion in Poisson Data 


Linear mixed models assume that the observations have a normal distribution 
conditional to the fixed effects of parameters. In addition, the mean д is independent 
of the variance o, whereas, in most GLMMs that assume a binomial or Poisson 
distribution, the variance “dispersion” is set to 1. That is, if the mean is known, then 
we assume that the variance is also known. The extra variability not predicted 
by a generalized linear model's random component reflects overdispersion. 
Overdispersion occurs because the mean and variance components of a GLM are 
related and depend on the same parameter that is being predicted through the 
predictor set. However, if overdispersion is present in a dataset, then the estimated 
standard errors and test statistics of the overall goodness of fit will be distorted and 
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adjustments must be made. In other words, when there is overdispersion in a dataset, 
the standard errors of the estimated parameters are too small, which leads to test 
statistics for the model parameters that are too large (i.e., type I error increases). 

Overdispersion can be caused by several factors: omission of predictor variables 
in the model, high correlation in the observations due to nested effects, 
misspecification of the systematic component, or incorrect distribution of the data. 
Systematic or overdispersion deviations may be the result of incorrect assumptions 
about the stochastic and/or systematic component of the model. The model may also 
not fit the dataset well because of an incorrect choice of the link function. Systematic 
deviations may also result from lack of either random effects or independence of 
observations. These random factors should generally address deviance violations 
and problems associated with the systematic component. 

According to Stroup (2013), overdispersion occurs when the variance exceeds the 
theoretical variance under the distribution model of the data. For any distribution 
with a nontrivial variance function, overdispersion is theoretically possible for 
distributions belonging to the one-parameter exponential family because they lack 
a scale parameter to mitigate the mean—variance relationship; therefore, models such 
as Poisson distribution are vulnerable to overdispersion. In summary, overdispersion 
occurs when: 


(a) The variance is larger than expected, which leads to standard errors that are not 
correct. 

(b) The mean structure is not well specified. 

(с) The linear predictor 7 is not well specified. 

(d) The chosen distribution of the data is not appropriate. 

(e) Predictor variables are omitted. 

(f) Observations are significantly correlated. 


If we do not account for overdispersion, we underestimate the standard errors (for 
a large variance, the standard errors are not correct) and inflate the statistical tests 
causing the type I error to inflate and the confidence intervals to be unreliable. 
Fig. 5.3 shows that as the predicted mean 3 increases, the residuals have a larger 
spread in the plot, indicating that the variance may increase as a function of the 
mean, whereas Fig. 5.4 shows a nonconstant variance. 

In the fit statistics obtained under the GLMM with the Poisson distribution (part 
(b), Table 5.13), the value of the statistic of Pearson s chi — square/DF = 11.8) 
indicates that there is a strong overdispersion in the dataset. Another aspect provided 
by the output is the value of the test statistic F (F = 523.57) tabulated in (part (c)) of 
Table 5.14. A value too large may indicate that the fit is incorrect. Once the 
researcher has detected overdispersion, he/she must consider the strategy that will 
take to remedy it. There are three possible alternatives to evaluate (test) and eliminate 
overdispersion. Below, we will review the three aforementioned alternatives. 
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Conditional Residuals by Predicted Values for Conteo 
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Fig. 5.3 Conditional residuals versus predicted values on the data scale 


5.2.4.1 Using the Scale Parameter 


The first alternative is to add a scale parameter and replace Var(yjlb;) = 4; by 
Var(y;lb;) = dj. This consists of replacing the logarithm of the conditional 
likelihood у; log (4j) — Ay — log (yy) by the quasi-likelihood yj; log (4) — 4/9. 
assuming that ф > 1 could adequately model the observed variance. 

The following GLIMMIX syntax invokes this alternative of adding a scale 
parameter under a Poisson response variable. 


proc 91іттіх; 

class Block Trt; 

model Count = Trt/dist=Poisson; 
random intercept/subject=block; 
random residual ; 

lsmeans Trt/ ilink ; 

run; 


The SAS code is highly similar to that previously used with the addition of the 
"random residual " command to the program. Note that the Laplace integration 
method (“method = laplace") has been removed, which causes the estimation to be 
performed using the pseudo-likelihood (PL) method; the scale parameter is esti- 
mated and used in the adjustment of the standard errors and test statistics. The 
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Conditional Studentized Residuals by Predicted Values 
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Fig.5.4 Residuals on the model scale 


GLIMMIX procedure uses the generalized chi-square divided by its degrees of 
freedom (Gener.chi — square /DF = $) as Ше estimate of the scale parameter. АП 


standard errors are multiplied by $. and all F-test values are divided by д. 
Table 5.16 shows part of the results. 

In Table 5.16, we observe the fit statistics (part (а)), covariance parameter 
estimates (part (b)), апа the value of the scale parameter, which is equal to 


Փ = 19.4848 (Residual(VC)). The value of the F-statistic under the Poisson distri- 
bution in the analysis is 26.87 (part (c)); this value is obtained by dividing the 


F-value from the previous analysis by (523.57 / $). The results indicate that even 


under this adjustment, overdispersion exists and that this value increases from 11.8 
to 19.4848 (part (a)). The inclusion of the scale parameter affects the variance 
estimate due to blocks Ook as well as the estimates of treatment means (part (d)), 
but the main impact is on the standard errors. 

The inclusion of the scale parameter implies that there is a quasi-likelihood, 
meaning that there is no true likelihood of the model and, therefore, there is no 
true likelihood process that provides a true expected value of 4 and a variance of фА. 
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Table 5.16 Results of the adjustment by adding the scale parameter 


(a) Fit statistics 


—2 Res log pseudo-likelihood 29.48 
Generalized chi-square 350.73 
Gener. chi-square/DF 19.48 
(b) Covariance parameter estimates 

Cov Parm Subject Estimate Standard error 
Intercept block 0.004981 0.02077 
Residual Variance component (VC) 19.4848 7.1346 

(c) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr> F 
Trt 2: 15 26.87 <0.0001 
(а) Trt least squares means 

Trt | Estimate | Standard error | DF |1-уаше | Pr > Id Mean Standard error mean 
1 5.9779 0.1166 15 | 51.29 <0.0001 | 394.60 45.9935 

2 6.0231 0.1142 15 | 52.74 <0.0001 | 412.84 47.1443 

3 4.4626 0.2396 15 | 18.63 <0.0001 86.7160 | 20.7753 

4 3.6306 0.3609 15 | 10.06 <0.0001 37.7352 | 13.6205 

5 3.5621 0.3734 15 9.54 <0.0001 35.2362 | 13.1576 

6 4.3722 0.2504 15 | 17.46 <0.0001 79.2189 | 19.8383 


5.2.4.2 Linear Predictor Review 


In count and binomial response variables, it is important to check whether the linear 
predictor is correctly specified, that is, whether it is being randomly affected by the 
experimental units within blocks. If 4; is being randomly affected by the experi- 
mental units within blocks, which is important in count and binomial response 
variables, then, the ANOVA table should include the effect of the block x treatment 
source of variation; this must be specified in the linear predictor in a GLMM. Thus, 
the linear predictor is specified as 


nij —" P tic bj + (br); 
Distribution : y; | bj, bri; ~ Poisson(2;;) 
bj 2% N(0, оза) 


bij AT N(0, Оза х z) 
Linear predictor: ij; — + Ti + bj + (br); 


Link function: log (i) =. 


The following GLIMMIX program allows the above model to be adjusted: 
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Table 5.17 Results of the fit by redefining the predictor of the model 


(a) Fit statistics for conditional distribution 


—2 Log L (Count | r. effects) 156.38 
Pearson’s chi-square 2.58 
Pearson’s chi-square/DF 0.11 
(b) Covariance parameter estimates 

Cov Parm Subject Estimate Standard error 
Intercept block 0.05969 0.05758 

Trt block 0.1152 0.04115 

(c) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Trt 5 15 41.48 <0.0001 
(d) Trt least squares means 

Trt | Estimate | Standard error | DF |1-уаше | Pr > Id Mean Standard error mean 
1 5.9692 0.2106 15 | 28.34 <0.0001 | 391.20 82.3947 

2 6.0037 0.2106 15 | 28.51 <0.0001 | 404.92 85.2707 

3 4.3674 0.2170 15 | 20.12 <0.0001 78.8402 | 17.1100 

4 3.4255 0.2301 15 | 14.89 <0.0001 30.7370 | 7.0714 

5 3.4005 0.2298 15 |1480 «0.0001 29.9786 | 6.8891 

6 4.3027 0.2175 15 | 19.79 <0.0001 73.8997 | 16.0707 


proc glimmix method=laplace; 

class Block Trt; 

model Count = Trt/dist=Poisson; 
random intercept Trt/subject=block; 
lsmeans Trt/ ilink ; 

run; 


Part of the output is shown in Table 5.17. The results tabulated in part (a) indicate 
that the overdispersion has been eliminated С =0.1 1) , but there is a risk of 


underestimating the variance. For this reason, it is highly recommended that the 
value of $ should be close to 1. The estimated variance components (part (b)) for 
blocks and block x treatments аге ст, = 0.05969 апа оти = 0.1152, 
respectively. 

The type III tests of fixed effects are highly significant (P = 0.0001), indicating 
that the six treatments are not equally effective in weed control (part (c)). The values 
in part (d) under the “Mean” column are the means on the original scale of the data 
for each of the treatments with their respective standard errors. The values of the 
means — compared with the previous ones — (using the scale parameter) do not vary 
much, but the standard errors have a more marked variation. 
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5.2.4.3 Using a Different Distribution 


Another way to account for the problem of overdispersion when using a Poisson 
distribution is to change the assumed distribution of the response variable. Poisson 
variables have the same mean and variance, but, іп biological sciences, with уагі- 
ables such as counts, this assumption is not always true. А negative binomial 
distribution is a good alternative (see Example 5.2), as previously discussed. А 
negative binomial variable’s mean is denoted by the parameter ¿ > 0 and variance 
A+ pi by ф > 0. That is, the expected value Е(у)- 4 and variance Var(y) = ¿+ $2”, 
where ¢ is the scale parameter. The components of this model are shown below: 

Given that у, | b;~Poisson(A,), it is assumed that 2,~Gamma~(1/¢, p), with ф as 
the scale parameter and b; ~ N (0, ай) The new specification of the resulting 
GLMM is as follows: 


Distribution : y; | b; ~ Negative Binomial(A;, Փ) 
bj ~ N(0, Oto) 
Linear predictor: 7 = + 7; + bj 
Link function: log(4j) = пу. 


Тһе following GLIMMIX statements fit ће model with а negative binomial 
distribution. 


proc glimmix method=laplace; 
class block Trt; 

model count = Trt/dist=NegBin; 
random block; 

lsmeans Trt/ ilink ; 

run; 


Some of the most relevant outputs from GLIMMIX are presented in Table 5.18. 
Pearson’s chi-squared (Pearson s chi — square/DF) value of 0.88 (part (a)) shows 
that overdispersion in the dataset has been removed. The estimated scale parameter 
tabulated in part (b) (Scale) 18 $ = 0.1080. This value is not the same scale parameter 
estimated using the Poisson model with the “random _residual_” command, since 
the methodology for calculating them in these models is different. However, as 
mentioned above, both scale parameters affect the relationship between the mean 
and variance in the Poisson and negative binomial distributions. 

The value of the test statistic shown in part (c) of Table 5.18, under the negative 
binomial distribution for the effect of treatments, is highly similar to the value 
obtained with the Poisson distribution when the effect of the block x treatment 
interaction was added to the linear predictor. The values under “Estimate” are 
estimates of the linear predictor on the model scale (part (d)), whereas those under 
the “Mean” column are the treatment means on the data scale, using the negative 
binomial distribution. Of the three proposed alternatives to fit these data, the last two 
(including in the predictor the block—treatment interaction and assuming a negative 
binomial distribution) provides a better fit. 
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Table 5.18 Fitting results by redefining the model structure 


(a) Fit statistics for conditional distribution 


—2 Log L (Count | r. effects) 235.11 
Pearson’s chi-square 21.13 
Pearson’s chi-square/DF 0.88 
(b) Covariance parameter estimates 

Cov Parm Estimate Standard error 
Block 0.07713 0.06955 

Scale 0.1080 0.03768 

(c) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Trt 5 15 41.11 <0.0001 
(d) Trt least squares means 

Trt | Estimate | Standard error | DF |1-уаше | Pr > Id Mean Standard error mean 
1 6.0280 0.2179 15 | 27.66 <0.0001 | 414.90 90.4085 

2 6.0465 0.2174 15 | 27.81 <0.0001 | 422.64 91.8789 

3 4.3941 0.2228 15 | 19.72 <0.0001 80.9704 | 18.0426 

4 3.5190 0.2335 15 | 15.07 ՀՕ.0001 337516 7.8815 

5 3.4684 0.2338 15 | 14.83 ՀՕ.0001 32.0863 7.5030 

6 4.3439 0.2235 15 | 19.44 ՀՕ.0001 77.0085 | 17.2111 


5.2.5 Factorial Designs 


Many experiments involve studying the effects of two or more factors. Factorial 
designs are the most efficient for these types of experiments. In a factorial design, all 
possible combinations of factor levels are investigated in each replicate. If there are 
a levels of factor A and b levels of factor B, then each replicate contains all ab 
treatment combinations. 


5.2.51 Example: А 2 x 4 Factorial with a Poisson Response 


This application refers to a factorial experiment involving explants from cotyledons 
of cucumber (Cucumis sativus L.) with two factors, 1.е., genotype (two levels) and 
culture medium (four levels). Each of the eight combinations of the genotype and 
culture levels were applied to four Petri dishes, each containing six leaf explants. The 
response variable was the number of buds in each of the leaf explants, 1.е., the 
response variable was a count. There are two sources of variation in this application, 
namely, variation between Petri dishes and variation between the explants within the 
Petri dishes (Table 5.19). 

The sources of variation and degrees of freedom for this experiment are shown in 
Table 5.20. 
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The components that define this model are shown below: 


Distribution : уж | petri.dish,, 


explante(petri.dish), ~ Poisson(Ajjx)petri.dish; ~ N (o. 9ана) ; 


explant(petri.dish) (д) ~N (0. ТТМ) 


Linear predictor : q; =] + a; + B; + (ap); + petri.dish, + explant(petri.dish) у, 


Link function: log (Ам) = Nijkt 


where դյ is the linear predictor in genotype i (i = 1,2), culture medium 
j (j = 1,2,3,4), Petri.dish k (k = 1,2,3,4), and explant / (1 = 1,2,3,4,5, 6), ղ 
is the intercept, o; is fixed effect due to genotype i, J; is the fixed effect due to 
culture medium j, (afp); is the effect of the interaction between genotype i and 
culture medium j, Petri.dish, is the random effect of the Petri.dish, and explant 
(Petri.dish),;4) is the random effect of the explant within the Petri.dish, assuming 
Petri.dish; ~ N(0, Oe; дһ) and explant(Petri.dish) ii) ^ N(0, болын dish) ) 

The following GLIMMIX procedure fits a factorial experiment with a Poisson 
response. 


proc glimmix method=laplace ; 

class genotype culture petri.dish explant; 
model y = genotype |culture/dist=Poisson; 
random petri.dish explant (petri.dish)); 
lsmeans genotype |culture/ilink lines; 
run; 


Some of the SAS output is shown in Table 5.21. The fit statistics in part (a) for 
this dataset are shown below. Note that “method = laplace" was used for the 
estimation process and to obtain Pearson’s fit statistic X IDF . The result indicates 
that there is evidence of overdispersion (Pearson s chi — square/DF = 1.84). 

Overdispersion, as discussed before, implies more variability in the data than 
would be expected, potentially explaining the lack of fit in a Poisson model. Part 
(b) shows the variance component estimates due to Petri_dish, which is equal 
to Crack = 0.003616, and, for the explants within Petridish, it is 
Ó plant Petri. dish) = 0.01462. However, the type III test of fixed effects Indicates that 
there is a statistically significant effect of genotype, culture medium, and the 
interaction of both factors (part c). 

The plot of residuals against the linear predictor in Fig. 5.5 provides further 
evidence of possible overdispersion. 

The least squares means оп the model scale for the genotype (part (a)), the culture 
medium (patt (b)), and the interaction between both factors (part (c)) are listed under 
the “Estimate” column of Table 5.22, whereas under the “Mean” column are the 
means of these factors but in terms of the data. 
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Table 5.19 Number of buds counted in the cucumber experiment 
Explant 
Genotype Culture Petridish 4 5 6 


+ | + + + S шошо шоко | о NIN] Re — — — + +| + + шошо W шоо р-н 
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Table 5.20 Sources of varia- 
tion and degrees of freedom 


Sources of variation 


Genotype 


Degrees of freedom 
a= TS 215 '1 


Culture 


b-1=4-1=3 


Genotype x culture 


(a — 1)((b — 1) 23 


Petri.dish x Explant 


с(ғ- )-4х6-1-23 


Error 


(by difference) = 161 


Total 


abcr — 1 = 191 
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Table 5.21 Conditional fit statistics, variance component estimates, and type III tests of fixed 
effects under the Poisson distribution 


(a) Fit statistics for conditional distribution 


—2 Log L (y l r. effects) 1168.16 
Pearson’s chi-square 354.01 
Pearson’s chi-square/DF 1.84 


(b) Covariance parameter estimates 


Cov Parm Estimate Standard error 
Petri.dish 0.003616 0.006014 
Explant (Petri.dish) 0.01462 0.008798 

(с) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr > F 
Genotype 1 161 11.01 0.0011 
Culture 3 161 57.30 <0.0001 
Genotype*culture 3 161 3.95 0.0095 


Since there is overdispersion in the data, we will fit the GLMM again using the 
negative binomial distribution. That is, under the following GLMM: 
Distribution : y; | Petri.dish;, explant(Petri.dish),, ~ Negative Binomail(4jj, ф), 
Petri.dish; ~ N(0, Opetri dish)» 


explant (Petri.dish) Uk) ^* мо, Š epaia) d 


Linear predictor : q; =N + a + B; + (аф); + Petri.dish, + explant(Petri.dish) ү) 


Link function: log (Aju) = уш 


and the scale parameter ф. 
The following GLIMMIX program allows us to fit а GLMM with a negative 
binomial response variable. 


proc glimmix ; 

class genotype culture petri.dish explant ; 

model y= cultivar|culture/dist=NegBin 1іпк=109; 
random petri.dish Explant (petri.dish) ; 

lsmeans cultivar|culture; 

run; 


It should be noted that this program is very similar to the previous one, and the 
only difference is that now a negative binomial distribution is used (“dist = negbin"). 
Part of the results is presented in Table 5.23. As we have already mentioned, a 
negative binomial distribution is another model for count variables when there is 
overdispersion in the dataset. If Pearson's chi-squared value divided over the degrees 
of freedom is less than or equal to 1, then the overdispersion is 0 or close to 0, which 
means that the model is able to efficiently capture the degree of overdispersion. 
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Conditional Studentized Residuals 
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Fig. 5.5 Studentized conditional residuals 


Based on the conditional distribution, Pearson’s chi-squared РЕ = 0.83) fit 
statistic indicates that we have по evidence of overdispersion, so we сап justify 
the negative binomial distribution, which is better than the Poisson distribution 
implemented above. In part (b), we show that the estimated scale parameter is 
$ —0.1712. This value is not the same as the parameter for the quasi-Poisson 
model obtained with the "random residual " command. Note that the variance 
components were slightly affected. Additionally, in Table 5.23, we can see the type 
III tests for the fixed effects of the model in part (c), where a significant effect of 
genotype, culture, and the interaction between both factors (genotype*culture) can 
be observed on the number of buds in the leaf explant. 

The “lines” option in the "Ismeans" command is used to obtain Fisher's least 
significant difference (LSD) means for both factors and their interaction. The means 
and their respective standard errors, on the model scale (“Estimate” column) and on 
the data scale (“Меап” column), are tabulated in Table 5.24, the genotype and 
culture medium are in Table 5.25, and the interaction between both factors is in 
Table 5.26. The estimated values in this mean comparison for cultivar (Table 5.24) 
correspond to the values of the linear predictor 7; on the model scale, whereas the 
means on the data scale is Ж (part (а)) and Ше comparison of means (on Ше model 
scale) are tabulated in part (b). 
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Table 5.22 Estimates on the model scale and means on the data scale under the Poisson 


distribution 


(a) Genotype least squares means 


Standard t- Standard error 

Genotype | Estimate | error DF |уаше |Рг> Ш |Меап | mean 
1 2.2979 0.05165 161 |4449 | <0.0001 |9.9533 |0.5141 
2 2.1345 0.05298 161 4029 | <0.0001 | 8.4531 |0.4479 
(b) Culture least squares means 

Standard t- Standard error 
Culture | Estimate | error DF | value | Pr> Id Mean mean 
1 2.1984 0.06180 161 | 35.57 | <0.0001 | 9.0107 | 0.5569 
2, 2.5684 0.05607 161 | 45.81 | <0.0001 | 13.0456 | 0.7314 
3 2.4609 0.05738 161 | 42.89 | «0.0001 | 11.7156 | 0.6723 
4 1.6371 0.07445 161 | 21.99 | «0.0001 | 5.1402 | 0.3827 
(c) Genotype*culture least squares means 

Standard Standard 

Genotype | Culture | Estimate | error DF | value | Pr > й Mean error mean 


0.07676 
0.06395 


161 


<0.0001 
<0.0001 


16.1018 


1 1 

1 2 

1 3 2.5331 0.06932 |161 | 36.54 | <0.0001 | 12.5925 | 0.8729 
1 4 1.6331 0.09793 |161 | 16.68 | <0.0001 | 5.1196 | 0.5014 
2 1 2.1503 | 0.07958 |161 | 27.02 | <0.0001 | 8.5877 | 0.6834 
2 2 2.3580 | 0.07370 <0.0001 | 10.5694 | 0.7790 
2 3 2.3887 | 0.07290 <0.0001 | 10.8997 | 0.7945 
2 4 1.6411 0.09760 |161 | 16.81 | «0.0001 | 5.1609 | 0.5037 


Table 5.23 Conditional fit statistics, variance component estimates, and type III tests of fixed 
effects under the negative binomial distribution 


(a) Fit statistics for conditional distribution 


—2 Log L (y I r. effects) 1143.90 
Pearson's chi-square 159.95 
Pearson's chi-square/DF 0.83 


(b) Covariance parameter estimates 


Cov Parm Estimate Standard error 
Petri.dish —0.02717 

Explant (Petri.dish) —0.04323 : 

Scale 0.1712 0.03514 

(c) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Genotype 1 161 4.43 0.0369 
Culture 3 161 25.91 <0.0001 
Genotype*culture 3 161 1.44 0.0322 
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Table 5.24 Estimates on the model scale and means on the data scale under the negative binomial 
distribution 


(a) Cultivar least squares means 


Standard t- Standard error 
Genotype | Estimate | error DF value |Pr>I | Mean mean 
1 2.3054 | 0.05407 161 | 42.64 | <0.0001 | 10.0287 | 0.5423 
2 2.1426 |0.05535 161 | 38.71 | <0.0001 | 8.5219 | 0.4717 
ni 2 


(b) Т grouping of genotype least squares means (а--0.05) 


LS means with the same letter are not significantly different 


Genotype Estimate 
1 2.3054 A 
2 2.1426 B 


Table 5.25 Means estimates on the model scale and data scale for the culture medium 


(a) Culture least squares means 


Standard t- Standard error 
Culture | Estimate | error DF |value | Pr>Id Mean mean 
1 2.2061 0.07653 161 | 28.82 | <0.0001 | 9.0802 | 0.6950 
2 2.5766 0.07198 161 35.80 | <0.0001 | 13.1527 | 0.9468 
3 2.4684 0.07300 161 33.81 | <0.0001 | 11.8031 |0.8617 
4 1.6451 0.08708 161 | 18.89 | <0.0001 |5.1815 | 0.4512 
ij ը 


(b) T grouping of culture least squares means (a=0.05) 


LS means with the same letter are not significantly different 


Culture Estimate 
2 2.5766 A 
3 2.4684 A 
1 2.2061 B 
4 1.6451 C 


For the culture medium (Table 5.25), the estimated values in this comparison of 
means correspond to the values of the linear predictor 77; (on the model scale), but, by 
applying the inverse link to Nis we obtain the values under the “Mean” column that 
provide the means on the data scale (part (a)). The mean comparisons on the model 
scale are shown in part (b). 

The results indicate that the means in culture media 2 and 3 provided a statisti- 
cally similar average number of buds compared to the means in culture media 1 and 
4 (see Fig. 5.6). 

The interaction between both factors (Table 5.26), the average number of buds, 
and the mean comparisons are shown in Table 5.26. 
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Table 5.26 Estimates on the model scale and means on the data scale for the interaction between 
genotype and culture medium 


Genotype*culture least squares means 

Standard Standard 
Genotype | Culture | Estimate | error Pr > И Mean error mean 
2.2540 | 0.1072 161 |21.02 | «0.0001 9.255 1.0212 
2.7869 |0.09844 |161 2831 | «0.0001 | 16.2310 | 1.5978 
2.5401 0.1020 161 24391 | «0.0001 | 12.6805 | 1.2933 
1.6408 | 0.1233 161 | 13.31 | «0.0001 | 5.1595 | 0.6360 
0.1093 161 | 19.75 | <0.0001 | 8.6558 | 0.9457 
2.3663 | 0.1050 161 | 22.53 | «0.0001 | 10.6582 | 1.1196 
2.3967 | 0.1045 161 | 22.94 | «0.0001 | 10.9865 | 1.1478 
1.6493 | 0.1230 161 | 13.41 | «0.0001 |5.2036 | 0.6401 


Nij Aij 
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Fig. 5.6 Comparison of the average number of buds as a function of the type of culture medium 
(LSD, a = 0.05) 


The values under “Estimates” (Table 5.26) correspond to those of the linear 
predictor 77; (model scale), but the values under “Mean” correspond to the means 


% оп Ше data scale. 

Graphically, Fig. 5.7 shows that genotype І іп culture medium 2 provides the 
highest number of buds, whereas the lowest number of buds was observed in culture 
medium 4. For genotype 2, the highest number of buds was observed in culture 
media 2 and 3. Finally, culture medium 4 is less suitable for both genotypes. 
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EI GENOTIPO1 GENOTIPO2 


Average number of buds 


Culture medium 


Fig. 5.7 Effect of the cultivar x culture medium interaction on the average number of buds 
(LSD, a — 0.05) 


5.2.6 Latin Square (LS) Design 


A Latin square (LS) is used where heterogeneity is associated with the crossing of 
two factors, generally, both with the same number of levels. This design was 
originally used in agricultural experimentation with plots placed in a square arrange- 
ment, with expected heterogeneity along the rows and columns of the square. 
Blocking in both directions across rows and columns is done in this experimental 
design. Sometimes in experimentation, blocking in two directions may be appropri- 
ate, i.e., the use of an LS design is a good option. Some examples are provided below 
to illustrate the use of this experimental design: 


* Field experiments on plots set in a square arrangement with rows and columns 
that contribute to the heterogeneity between plots. For example, gradients of 
fertility, moisture, management practices, and so on. 

* Experiments in greenhouses, rooms with a controlled environment, or growth 
chambers where the placement of shelves, trays, etc. with respect to walls or light 
sources can introduce systematic variability related to temperature, humidity, or 
light in different directions (e.g., left to right, back to front, or top to bottom). 

* Laboratory experiments in which there are two potential sources of variability 
(e.g., technicians, machines, etc.) and researchers are aware of the possible impact 
of variation from both sources. 


For an LS layout, the number of rows (r) and columns (c) should be equal to the 
number of treatments (f) and the number of replicates of each treatment. The 
assignment of treatments is such that each treatment appears exactly once in each 


158 5 Generalized Linear Mixed Models for Counts 


Table 5.27 Sources of varia- 


а Sources of variation Degrees of freedom 
tion and degrees of freedom ой ----------------------........- 
: ' Rows t—1 

a Latin square design 

Columns t—1 

Treatments t—1 

Error t- Dt — 2) 

Total txt—1 


row and column, with each row and column containing a full set of treatments. Thus, 
the treatment effect estimates are independent of the differences between rows or 
columns, and the rows, columns, and treatments are orthogonal to each other. 

The analysis of variance for this experimental design, assuming that there are 
r rows, c columns, and t treatments, with r = c = t, contains the following sources of 
variability (Table 5.27). 

From the analysis of variance table, the linear model for an LS design with 
t treatments is as follows: 


Ук =H + f; + Ck + Ti + Eijk 


where у; is the response observed in treatment i in row f and column c, и is the 


overall mean, f; is the random effect of row j assuming f; — N (o. 6; չ €, is the 


random effect of column k with cg ~ N (0, б), т; 15 the fixed effect of treatment i, 
and ғ; is the distributed random error term N(0, o^). Note that the treatments are 
allocated in the jkth quadrant (in row j and column К). 


5.2.6.1 Latin Square Design with a Poisson Response 


In a series of field experiments, several "inducer-attractant" strategies were tested to 
control insect pests in oilseed rape. In one experiment, the use of wild turnip rape 
(turnip rape) as an earlier flowering trap crop (TR) (the "attractor") was tested 
together with the use of a repellent (an antifeedant) applied to oilseed rape in spring 
(S, the "inducer"). Untreated oilseed rape (U) was included as a control. The 
experiment was set up as a 6 x 6 Latin square with two replicates of each of the 
three treatments per row and column. An assessment of the number of mature pollen 
beetles was made on 10 plants per plot in early April, 1 day after spraying the 
repellent (antifeedant). The average number of adult beetles sampled on 10 plants 
per plot was recorded (Appendix 1: Data: Beatles). The question is: Is there evidence 
that the attractor or inducer works? That is, are fewer beetles present in the proposed 
treatments compared to the control? 
The model components that define this GLMM are as described below: 
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Distribution: уу | fj. ск ~ Poisson (Aju) 
Л) ~ м(0, о), а ~ N (0, o?) 
Linear predictor: Иди =N + f; + ck + ti 


Link function: log (Aju) = Tu 


where rjj; is the linear predictor that relates the effect of the repetition / (/ = 1,2) in 
row j (j = 1,2,:::,6) and column К (k = 1,2,,:::,6) when treatment i is applied 
(i= 1, 2,3, ), 7 is the intercept, z; is the fixed effect of treatment i, f; is random effect 
of row j, and c, is the random effect due to column k, assuming that there is no 
interaction between the rows and columns as well as between the treatments and 
rows or the treatments and columns. The assumed distributions for rows and 


columns are f ~ N (o. of) and cy ~ N (0, 62), respectively. The model uses Ше 


linear predictor (ы) to estimate the means (Лы = Иры) of the treatments. 
The following GLIMMIX program fits a Latin square design with a Poisson 
response: 


Proc glimmix nobound method=laplace; 

class Row Column Treatment; 

model count = treatment/dist=Poi 1іпк=109; 
random row column; 

lsmeans treatment/lines ilink; 

run; 


Part of the output is shown in Table 5.28. In the values of the fit statistics (part 
(а)), we observe that the value of Pearson's chi-square divided by the degrees of 


freedom is less than 1 (& = 0.55) , indicating that there is no overdispersion in the 


data and that the Poisson distribution adequately models the dataset. 

The type Ш tests of fixed effects in part (b) indicate that there is no significant 
evidence of differences between the treatments (P = 0.0621). 

Part (c) of Table 5.28 shows the estimates of treatments on the model scale 
(“Estimate”) and on the data scale (“Mean”) with their respective standard errors. 
The values 4.6191, 6.9396, and 5.1561 (under the “Mean” column) correspond to 
the treatment means for S, TR, and U, respectively. 


5.2.6.2 Randomized Complete Block Design in a Split Plot 


Sometimes the researcher is interested in testing multiple factors using different 
experimental units, and, in most cases, the experimenter cannot randomly accom- 
modate the treatment combinations. Suppose that one wishes to test two factors, A 
and B with a and b levels each, respectively. The levels of the first factor (A) are 
randomly applied to the primary experimental units. Then, the levels of the second 
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Table 5.28 Results of the analysis of уапапсе 


(a) Fit statistics for conditional distribution 


—2 Log L (Conteo | r. effects) 147.03 
Pearson’s chi-square 19.78 
Pearson’s chi-square/DF 0.55 
(b) Type Ш tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
Treatment 3 23 3.14 0.0621 
(c) Treatment least squares means 
Standard t- Standard error 

Treatment | Estimate | error DF | value | Pr> Id Mean | mean 
S 1.5302 0.1343 23 |119 |<0.0001 | 4.6191 | 0.6204 
TR 1.9372 0.1096 23 | 17.68 | <0.0001 | 6.9396 | 0.7605 
U 1.6402 0.1271 23 1290 | <0.0001 | 5.1561 | 0.6555 

Ti 2 


factor (В) are applied to the secondary subunits formed within the primary unit in 
which the first factor was applied. In other words, the primary experimental unit 
(whole plot) was used for the application of the first factor; then, after this, it was 
divided to form the secondary experimental units (subplots) for the application of the 
levels of the second factor. Since the split-plot design has two levels of experimental 
units, the whole plot portions (primary units) and subplots (secondary units) have 
different experimental errors. Split-plot experiments were invented in agriculture by 
Fisher (1925), and their importance in industrial experimentation has been widely 
recognized (Yates 1935). 

As a simple illustration, consider a study of three pulp preparation methods 
(factor A) and four temperature levels (factor B) on the effect of paper tensile 
strength (paper quality). A batch of pulp is produced by one of the three methods; 
it is then divided into four equal portions (samples). Each portion is cooked at a 
specific level of temperature. The assignment of treatments to plots and subplots is 
shown in Table 5.29. 

The standard ANOVA model for two factors in a split-plot design, in which there 
are three levels of factor A and four levels of factor B nested within factor A, is 
described below: 


Ук = И + Gi + г  o(r)g +B A (ap); Б Eijk 


where у is the observed response at level i (i = 1,2,3) of factor A and at level 
J (j = 1,2,3,4) of factor B in block k (k = 1,3,3), и is the overall mean, a; is the 
effect at level i of factor А, r, 15 the random effect of blocks assuming r, ~ N (0, o2) Я 
о(г) is the random effect of the error of the whole plot assuming 


а(г) ~N (o. ek). В; is the effect at level j of factor B, (af), is the interaction 
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Table 5.29 Assigning treatments to whole plots and subplots 


Block 1 Block 2 Block 3 
Preparation method Preparation method Preparation method 
Temperature Ау Аз А» են А» Аз А! 
8. АВ14 AB34 |АВа |АВа |АВа |АВа АВ 
E AB АВз› |АВ» |АВо |АВ» |АВо АВ) 
В, АВ, 48: |АВі (48. 4827. |АВз АВ, 
ABis АВзз 4825 |АВз (4895 |АВзз |АВіз 
Table 5.30 Sources of varia- Sources of variation Degrees of freedom 
tion and degrees of freedom Blocks 18129 
for а randomized block design 
with a split-plot treatment Factor А a-1=3-1=2 
arrangement Error, (а= DG — 1) = 4 
Factor B b-1=4-1=3 
АхВ (а- (b — 1 = 6 
Error ar — 1b — 1] =3 x2x 3 = 18 
Total rxaxb—1=3x3x4-1=35 


fixed effect at level i of factor A and at level j of factor В, and ёк is the normal 
random experimental error {éj,~iidN(, oD). The ANOVA table with sources of 
variation is shown in Table 5.30 for this experimental design. 


Example 5.1 A split-plot design in randomized complete block arrangement with a 
Poisson response 


A split plot is probably the most common design structure in plant and soil 
research. Such experiments involve two or more treatment factors. Typically, large 
units called whole plots are grouped into blocks. The levels of the first factor are 
randomly assigned to whole plots. Each whole plot is divided into smaller units, 
called subplots (split plots). Next, the levels of the second factor are randomly 
assigned to units of split plots within each whole plot. 

In this example, four blocks were implemented, which were divided into seven 
parts for the seven levels of the first factor (A1, A», Аз, A4, As, Ав, and Аз), as whole 
plots. Then, each whole plot was divided into four units for randomly assigning the 
four levels of factor B, known as subplots (В|, Во, Вз, and Вл). Both factors were 
used to control the growth of a particular weed. Both factors were randomly 
allocated in each block, as shown below: 


Block 1 Block 4 

A |А7 |Аз 1А |А5 Ад [А ո [А6 |Аз |A7 Ն [Ау ՍԽ |44 
Вз |В; |В, |В! B; |B, В» Вз |В; |В, |В! B; |B, В; 
В, |В |В |В) В |В; |В; մե В, |В |В |В) |B, |В» |В; 
B; |В, |B, |B4 Թ |В; |В, B; |В, |B, |В, Թ |В; |В, 
В, |B, B; |В» |В, հլ |В! ate В, |В B; |В |В, հլ |В! 
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Table 5.31 Sources of varia- 


Sources of variation Degrees of freedom 
tion and degrees of freedom Blocks == c 
for a randomized block design 
with a split-plot treatment Factor A a-l=7-1=6 
arrangement Error,(A x r) (a — D(r— 1) = 18 
Factor B b-1=4-1=3 
АхВ (a — D(b 1) = 18 
Error, ar — (00 = ) =7х3х3 = 63 
Total тгхахЬ—1=4х7х4—1=111 


The sources of variation and degrees of freedom for this experiment аге shown 
below in Table 5.31: 
In this experiment, the response variable was the number of weeds in each of the 


plots (Appendix 1: Weed counts). The components that define this GLMM are as 
shown below: 


Distribution: уу | rk, a(r);, ~ Poisson (А) 
Fk ғ“ N(0, 22); a(r) d N(0, 61.) 
Linear predictor: i; =N + a; + ry + а(г) + fj; + (ар); 


Link function: log (Ax) Հ Nijx 


where "ж is the linear predictor that relates the effect of factor A with i levels 
(i = 1,2,°°°,7)апа factor B with j levels (j Հ 1,2,3,4) in block k with 
(К = 1,2,3,4); n is the intercept, a; is the fixed effect at level i of factor A, р; is 
the fixed effect at level j of factor В, (ap); is the fixed effect of the interaction 
between level i of factor A and level j of factor В, r, is the random effect due to 
block; and a(r); is the random error effect of the whole plot, assuming r, ~ 
N (0, o?) and a(r) ~ N (0, o2 me respectively. The model uses the aforementioned 
linear predictor (Ик) to estimate the means (Aj, = и) of the treatments. 


The following GLIMMIX program fits a split-plot block design with a Poisson 
response variable: 


proc glimmix method=laplace; 

class blockab; 

model count=a |b / dist=Poisson 1іпк=109; 
random block block*a; 

lsmeans a|b /lines ilink; 

run; 


Part of the output is shown below. 
As in the previous examples, the Poisson model was found to be inadequate 
because the value of Pearson’s chi-squared statistic divided by the degrees of 
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Table 5.32 Results of the (a) Fit statistics for conditional distribution 

analysis OF variante —2 Log L (Conteo | r. effects) 1053.96 
Pearson’s chi-square 504.44 
Pearson’s chi-square/DF 4.50 
(b) Covariance parameter estimates 
Cov Parm Estimate Standard error 
Bloque 0.01526 0.03867 
Bloque*A 0.2454 0.07565 


(c) Type III tests of fixed effects 


Effect Num DF Den DF F-value Pr> F 

A 6 18 2.32 0.0775 
B 3 63 22.91 «0.0001 
A*B 18 63 10.06 «0.0001 


2 


freedom is greater than 1 Է = 4.50) .This indicates that we have probably 
misspecified either the conditional distribution of y | b or the linear predictor, but, 
in this case, there 15 evidence that we need to look for other distributions for this 
dataset (part (a), Table 5.32. In addition, in part (b), the values of variance compo- 
nent estimates due to blocks and blocks х А аге tabulated 
(62 — 0.01526; a, = 0.2454). On the other hand, the type Ш tests of fixed effects 
(part (c)) show a significant effect of factor B and the interaction between both 
factors. 

An alternative to reduce the overdispersion is to keep the same linear predictor, 
changing the Poison distribution in the response variable by the negative binomial 
distribution, that is: 


Distribution: узд | ,а(ғ)д ~ Negative binonial (Я, Փ) 
ry ~ ШАМ (0,07), a(r);, ~ iidN (0,024) 


Linear predictor: ijj, = 5 + a; + ry + ar), + В, + (af); 


Link function: log (Ai) = 
The following syntax fits a GLMM under а negative binomial distribution. 


ркос glimmix method=Laplace; 

class block a b; 

mode1 count=a |b / dist=NegBin link=log; 
random intercept a /subject=block; 
1втеапв а|Ь/11пев ilink; 

run; 


Part of the output is shown below (Table 5.33). According to the results tabulated 
in (a), they indicate that the overdispersion has been removed from the analysis 
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Table 5.33 Results of the (a) Fit statistics for conditional distribution 

analysis of variance —2 log L (Conteo | г. effects) 838.51 
Pearson’s chi-square 79.36 
Pearson’s chi-square/DF 0.71 
(b) Covariance parameter estimates 
Cov Parm Subject Estimate Standard error 
Intercept Bloque 0.002421 0.02768 
A Bloque 0.1222 0.07102 
Scale 0.3458 0.06875 
(с) Type Ш tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
A 6 18 2.71 0.0473 
B 3 63 2.13 0.1054 
A*B 18 63 1.18 0.3017 


2 


0.0024 and o2, = 0.1222 for blocks and blocks х A, respectively. The estimated 
scale parameter is ф = 0.3458. Note that the results under the negative binomial 
distribution differ from those obtained under the Poisson distribution, which is due, 
of course, to the fact that the negative binomial distribution better captures 
overdispersion. The fixed effects F-test for factor А is significant at the 5% signif- 
ісапсе level (part (c)), whereas factor В and the interaction effect do not significantly 
influence the response variable. 


(% = 0.71). The variance components estimates, tabulated in part (b), are o 


Example 5.2 A split-split plot in time in a randomized complete block design with a 
Poisson response. 


The propagation of coffee seedlings through grafting in nurseries depends on 
several factors such as the type of substrate, the rootstock of the plant that will host 
the graft, type of graft, light intensity, type and size of the container, humidity, 
temperature, and so forth. The objective of this experiment was to evaluate the effect 
of shade cloth (light intensity), type of container, and clone on the number of leaves 
produced by the Coffea canephora P. clones grafted with the Coffea arabica 
L. variety Oro azteca. 

The factors studied were the color of the shade cloth (black, pearl, and red), 
container size (tube of 0.5 kg and 1 kg), and five coffee clones of the variety Coffea 
canephora P. plus a franc foot (Coffea arabica L. and Var. Oro azteca) over a period 
of 11 months (Appendix 1: Coffee data). The clones used in the experiment are listed 
below (Table 5.34). Different physiological parameters were evaluated for 
11 months. 

This work was implemented in four randomized complete blocks. The following 
table exemplifies how a block was constructed. 
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Table 5.34 Clones of Coffea canephora P 


Graft carrier (Coffea canephora P.) Grafting (Coffea arabica L.) Code 
Clone 1 Var. Aztec gold СІ 
Сіопе 2 Var. Aztec gold C2 
Clone 3 Var. Aztec gold СЗ 
С1опе 4 Var. Aztec gold C4 
Clone 5 Var. Aztec gold С5 
Franc foot: Coffea arabica L. Var. Aztec gold Pf 
Shade cloth red Shade cloth Реп Shade cloth black 

Container | Container Container | Container Container | Container 
Tray | 0.5 kg 1 kg Ттау | 0.5 kg lkg Tray |0.5 kg 1 kg 
C2 C5 C4 C4 C5 C2 C5 C2 C4 
C4 Pf C3 C3 Pf C4 Pf C3 C2 
C3 CI C5 C5 СІ СЗ СІ C5 C3 
C5 C2 Pf Pf C2 C5 C2 Pf C5 
Pf C4 СІ СІ СА Pf C4 С1 РЇ 
С1 СЗ C2 C2 СЗ СІ СЗ С24 СІ 


The statistical model describing а split-split plot in time design is described 
below: 


Убит = И + G; + т» + (аг) и + B; + (aB); + ук + (ау), + (Ву) + (абу) 
(ԿԵ) ы + 7i + (ат) + (Вт), + (арс), + (ут) + (аут) 
+(#ут) жш? (apy) ju + Eijkim 

і- 1,2,3;)--1,2,3,4,5;К--1,2,31-1,%%,11;т-:1,2,3,4 


where Ут is the response variable in repetition m, shade cloth i, clone j, and tray 
Кіп time |: и is the overall mean; а; is the fixed effect due to the type of shade cloth; 
В» Ye and т, are the fixed effects due to clone type, tray,and sampling time, 
respectively; (а); (ау) (Ву) (ат)а, (Вт), and (yz), are the effects of the double 
interactions of the factors shade cloth type with clone, tray, and sampling time; 
(APY ijk, (арт), (аут), (Вут)зы, and (орут) ы are the effects of the third and fourth 
interactions of the factors under study; (ат) is the random effect of blocks with type 
of shade cloth with rm, (ar)im, (табу) дп are the random effect due to blocks, blocks 
with type of shade cloth, blocks with type of shade cloth, and time assuming 


rg ~ N (0,02), (ағ), ~ N (0, о2,) (raby);, ~ n(o, ТЕ, ‚ and єк is random 
error (eiu, N(O, oD}. 

The following SAS program fits a GLMM in a split-split plot in time under a 
randomized complete block design with a Poisson response. 


ijkm 


proc glimmix data=work.Nhojas cafe nobound method=laplace; 
class shade clone tray rep time; 
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Table 5.35 Еш statistics for choosing the correlation structure 


Correlation structure 


Fit statistics CS AR(1) UN TOEP(1) |ANTE(1) 

—2 Log likelihood 29047.38 | 29043.38 | Not converged | 29053.85 | No converged 
AIC (smaller is better) 30236.38 | 30235.38 30243.85 

AICC (smaller is better) | 30338.31 | 30337.31 30345.44 

BIC (smaller is better) 29871.61 | 29869.61 29878.70 

САІС (smaller is better) | 30469.61 | 30465.61 30473.70 

HQIC (smaller is better) | 29435.72 | 29432.72 29442.55 


Table 5.36 Conditional fit 


ud : (a) Fit statistics for conditional distribution 
statistics and variance compo- 


| -2 Log L (у I r. effects) 28709.17 
nent estimates 

Pearson’s chi-square 4288.74 
Pearson’s chi-square/DF 0.56 
(b) Covariance parameter estimates 
Cov Parm Subject Estimate Standard error 
Variance Rep 0.008106 0.001392 
AR(1) Rep —0.3254 0.09437 


model y = shade |c1one|tray|time/dist—poi link—log; 

random intercept shade shade*clone*tray/subject=rep type=ar(1) ; 
lsmeans shade | с1опе | Егау | Е1те /lines ilink; 

гип; 


Some of the results are listed below. То study which correlation structure best fits 
this experimental design, five types of correlation structures were tested 
(Table 5.35): compound symmetry (“CS”), autoregression of order 1 (“AR(1)”), 
unstructured (“UN”), Toeplizt of order 1 (“Тоер(1)”), and ante (ANTE(1)). To do 
this, in the “random” command with the “type” option, the type of correlation to be 
tested is specified, and it is here where the option of type of variance—covariance 
structure must be changed. The fit statistics indicate that the variance—covariance 
structure that best fits the model is the autoregressive structure of order 1 (AR(1)). 
This can be seen in the following table in which the goodness-of-fit statistics for 
choosing between all these variance—covariance structures are reported. 

Table 5.36 shows the conditional statistics and variance component estimates. 
The fit statistic Pearson s chi — square/DF — 0.57 in part (a) indicates that, in a 
conditional model, there is no evidence of mis-specifying the distribution or linear 
predictor. In other words, there is no overdispersion in the dataset, and, therefore, it 
is reasonable that the analysis and inference can be based on the Poisson model. 

The analysis of variance for the type III tests of fixed effects (Table 5.37) 
indicates that there is a highly significant effect of the main effect type of shade 
cloth (P — 0.0001), clone (P — 0.0001), and tray (P — 0.0001) as well as of most of 
the interactions, except for the interactions shade cloth*clone; (P — 0.3846), 
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Table 5.37 Туре Ш fixed effects tests 


Type III tests of fixed effects 


Effect Num DF Den DF F-value Pr > F 

Shade 2 6 3.44 0.1011 
Clone 5 153 16.38 <0.0001 
Shade*clone 10 153 1.08 0.3846 
Tray 2 153 56.60 <0.0001 
Shade*tray 4 153 8.83 <0.0001 
Clone*tray 10 153 2.86 0.0027 
Shade*clone*tray 20 153 1.71 0.0363 
Time 10 6822 721.20 <0.0001 
Shade*time 20 6822 6.91 <0.0001 
Clone*time 50 6822 3.17 <0.0001 
Shade*clone*time 100 6822 0.80 0.9289 
Tray*time 20 6822 9.03 <0.0001 
Shade*tray*time 40 6822 2.42 «0.0001 
Clone*tray*time 100 6822 0.74 0.9760 
Shade*clone*tray*time 200 6822 1.07 0.2484 


Table 5.38 Estimated means on the model scale and on the data scale for the shade cloth 


(a) Shade cloth least squares means 


Shade Standard Standard error 
cloth Estimate | error DF | ғуаше | Pr > И Mean | mean 

Black 1.6221 0.01542 2 105.17 | «0.0001 | 5.0638 | 0.07810 

Pearl 1.5472 0.01533 2 100.94 | <0.0001 |4.6981 |0.07201 

Red 1.7184 0.01301 2 132.09 | «0.0001 | 5.5757 | 0.07254 


(b) T grouping of shade cloth least squares means (a=0.05) 


LS means with the same letter are not significantly different 


Shade cloth Estimate (7;) 

Red 1.7184 A 
Black 1.6221 B 
Pearl 1.5472 B 


shade_cloth*tray*time (P = 0.9289), clone*tray*time (P = 0.9760), and 
shade_cloth*clone*tray*time (P = 0.2484). 

The means and standard errors of each of the main effects, on the data scale, for 
shade_cloth, tray, and clone are shown in the “Mean” column in part (a) of 
Table 5.38, whereas in part (b), the mean comparisons for the type of shade cloth 
are shown. 

Table 5.39 presents the estimates of the linear predictor (“Estimates” column) in 
terms of the model scale and treatment means in terms of the data scale (“Меап” 
column) for the type of clone (part (a)). In addition, in Table 5.39 (part (b)), the mean 
comparisons are presented for the type of clone. 


168 5 Generalized Linear Mixed Models for Counts 


Table 5.39 Estimated means on the model scale and on the data scale for the type of clone 


(a) Clone least squares means 


Clone | Estimate | Standard error | DF |ғуаше | Pr > l Mean Standard error mean 
Cl 1.5008 0.04989 153 | 30.08 <0.0001 | 4.4854 | 0.2238 
C2 1.4250 0.05080 153 | 28.05 «0.0001 | 4.1578 | 0.2112 
C3 1.5064 0.05019 153 3002 | <0.0001 | 4.5106 | 0.2264 
C4 1.4750 0.05029 153 | 29.33 «0.0001 | 4.3709 | 0.2198 
C5 1.5965 0.04970 153 |3212 | <0.0001 |4.9357 |0.2453 
Pf 1.6344 0.04943 153 |33.07 «0.0001 |5.1264 | 0.2534 


(b) T grouping of clone least squares means (a=0.05) 


LS means with the same letter are not significantly different 


Clone Estimate 
Pf 1.6344 A 
C5 1.5965 A 
C3 1.5064 B 
СІ 1.5008 В 
СА 1.4750 C B 
€2 1.4250 C 


Table 5.40 Estimated means оп the model scale and on the data scale for the tray factor 


(a) Tray least squares means 


Tray |Estimate | Standard error | DF |1-уаше | Pr > tl Mean Standard error mean 
СНІ | 1.3843 0.04859 28.49 | <0.0001 | 3.9921 | 0.1940 
CH2 | 1.5665 0.04838 32.38 «0.0001 | 4.7898 | 0.2317 
CH3 | 1.6183 0.04819 33.58 «0.0001 | 5.0443 | 0.2431 


(b) T grouping of tray least squares means (a=0.05) 


LS means with the same letter are not significantly different 


Tray Estimate 
CH3 1.6183 A 
CH2 1.5665 B 
CHI 1.3843 C 


Table 5.40 presents the estimates for the levels of the tray on both scales (part (а)). 
Similarly, in this table (part (b)), the treatment mean comparisons are presented for 
the levels of the tray. 

Tables 5.41, 5.42, 5.43, and 5.44 show the means and standard errors on both 
scales of the two-factor and three-factor interactions. 

Interaction type of shade cloth*clone 

Interaction type of shade cloth*tray 

Interaction clone*tray 

Interaction shade*clone*tray 

Although it is not the objective of this book, part of the results is discussed below. 
In Fig. 5.8, it is possible to observe that the red shade cloth significantly stimulates 
leaf production in coffee grafts, followed by the black and pearl shade cloths. The 
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Table 5.41 Estimated means on the model scale and on the data scale for the type of shade 
cloth*clone 


Shade cloth*clone least squares means 


Shade Standard t- Standard error 
cloth Clone | Estimate | error DF | value Pr > Id Mean | mean 
Black Cl 1.5109 0.06230 153 | 24.25 | <0.0001 | 4.5307 | 0.2823 
Black C2 1.3340 0.06507 153 | 20.50 | «0.0001 | 3.7961 | 0.2470 
Black C3 1.4990 0.06354 153 | 23.59 | «0.0001 | 4.4771 | 0.2845 
Black C4 1.4485 0.06425 153 |22.54 | «0.0001 | 4.2566 | 0.2735 
Black C5 1.5916 0.06163 153 |25.83 | «0.0001 |4.9118 | 0.3027 
Black pf 1.6219 0.06114 153 |26.53 | «0.0001 | 5.0628 | 0.3095 
Pearl СІ 1.3835 0.07711 153 | 17.94 | <0.0001 | 3.9889 | 0.3076 
Pearl C2 1.3926 0.07781 153 | 17.90 | <0.0001 | 4.0254 | 0.3132 
Pearl C3 1.4028 0.07589 153 | 18.49 | <0.0001 | 4.0666 | 0.3086 
Pearl C4 1.4288 0.07575 153 | 18.86 | <0.0001 | 4.1736 | 0.3161 
Pearl C5 1.5216 0.07536 153 |20.19 | «0.0001 | 4.5797 | 0.3451 
Pearl pf 1.5285 0.07458 153 |20.50 | «0.0001 |4.6112 | 0.3439 
Red СІ 1.6081 0.06991 153 |23.00 | <0.0001 | 4.9933 | 0.3491 
кеа C2 1.5483 0.07100 153 21.81 | «0.0001 | 4.7036 | 0.3339 
Red C3 1.6175 0.07055 153 | 22.93 | «0.0001 | 5.0404 | 0.3556 
Red C4 1.5477 0.07072 153 |21.88 | «0.0001 | 4.7005 | 0.3324 
Red C5 1.6762 0.06971 153 |24.04 | «0.0001 |5.3451 | 0.3726 
Red pf 1.7528 0.06923 153 |25.32 | «0.0001 | 5.7707 | 0.3995 


Table 5.42 Estimated means on the model scale and on the data scale for the interaction type of 
shade cloth*tray 


Shade*tray least squares means 


Shade Standard t- Standard error 
cloth Tray | Estimate | error DF | value |Pr> Ш |Mean | mean 


Black СНІ | 1.4274 | 0.05961 153 | 23.94 | «0.0001 | 4.1679 | 0.2485 
Black СН2 | 1.5523 0.05846 153 | 26.55 | «0.0001 | 4.7224 | 0.2761 
Black CH3 | 1.5232 |0.05824 153 | 26.15 | «0.0001 | 4.5869 | 0.2672 
Pearl СНІ | 1.2070 |0.07354 153 | 16.41 | «0.0001 | 3.3434 | 0.2459 
Pearl CH2 | 1.4972 | 0.07218 153 | 20.74 | «0.0001 | 4.4691 | 0.3226 
Pearl CH3 | 1.6247 0.07145 153 |22.74 | «0.0001 | 5.0771 |0.3628 


Red СНІ |1.5185  |0.06733 153 |22.55 | «0.0001 |4.5655 |0.3074 
Red CH2 | 1.6499 |0.06714 153 24.57 | «0.0001 | 5.2066 | 0.3496 
Кеа CH3 |1.7068 |0.06732 153 |25.35 | <0.0001 |5.5114 |0.3710 


production of leaves in coffee grafts shows a bimodal figure that can be due to factors 
such as humidity and temperature. Extreme conditions of both factors cause stress at 
the growing points and, therefore, the appearance of leaves. 

Regarding the type of clone used as rootstock, the clones showed a better average 
leaf production in months 5 and 6, whereas the lowest production was observed in 
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Table 5.43 Estimated means оп the model scale and on the data scale for the clone-tray interaction 


Clone*tray least squares means 


Standard t- Standard error 

Clone | Tray | Estimate | error DF | value | Pr > Id Mean | mean 

СІ СНІ |1.3916 |0.06112 153 |2277 | «0.0001 | 4.0214 | 0.2458 
Cl CH2 | 1.5502 |0.05861 153 | 26.45 | «0.0001 | 4.7122 | 0.2762 
Cl CH3 | 1.5607 | 0.05861 153 | 26.63 | <0.0001 | 4.7622 | 0.2791 
C2 СНІ | 1.2242 |0.06459 153 | 18.95 | «0.0001 | 3.4014 0.2197 
C2 CH2 | 1.4780 |0.06029 153 | 24.51 | «0.0001 | 4.3843 | 0.2644 
C2 CH3 | 1.5727  |0.05890 153 | 26.70 | «0.0001 | 4.8196 | 0.2839 
C3 СНІ |1.2024 |0.06114 153 |2114 | <0.0001 | 3.6414 | 0.2226 
C3 CH2 | 1.5433 |0.05975 153 25.83 | «0.0001 | 4.6799 | 0.2796 
C3 CH3 | 1.6836 |0.05841 153 | 28.83 | «0.0001 | 5.3851 0.3145 
C4 СНІ | 1.2982 |0.06251 153 |2077 | «0.0001 | 3.6626 | 0.2289 
C4 СН2 |1.5829 |0.05815 153 | 27.22 | «0.0001 | 4.8690 | 0.2831 
СА CH3 | 1.5439 |0.05939 153 | 26.00 | «0.0001 | 4.6828 | 0.2781 
C5 СНІ | 1.5311 0.05843 153 |26.20 | «0.0001 | 4.6234 | 0.2702 
C5 CH2 | 1.5981 0.05920 153 2699 | «0.0001 | 4.9438 | 0.2927 
C5 CH3 | 1.6602 |0.05803 153 2861 | «0.0001 | 5.2604 | 0.3053 
pf СНІ |1.5684  |0.05794 153 |27.07 | «0.0001 | 4.7989 | 0.2781 
pf CH2 |1.6464 |0.05833 153 2823 | «0.0001 | 5.1884 | 0.3026 
pf CH3 |1.6884 |0.05728 153 2948 | «0.0001 | 5.4107 | 0.3099 


months 1, 2, 8, and 9. The franc foot showed a higher average of leaves compared to 
the rest of the clones (Fig. 5.9). 


5.3 Exercises 


Exercise 5.3.1 A researcher in the area of plant sciences wants to know what is the 
response of a plant in vitro culture when it is exposed to different concentrations 
(ppm) of a chemical compound to the number of outbreaks that the explant produces 
Оӊ). The data for this experiment are given below (Table 5.45): 


(a) Write down the analysis of variance table (sources of variation and degrees of 
freedom). 

(b) Write down the components of the GLMM. 

(c) Analyze the dataset with the model proposed in (b). 

(d) Compare and contrast the results of these analyses. If necessary, reanalyze the 
dataset using the same model as above, but, now, assume that the data have a 
negative binomial distribution. 

(e) Summarize the relevant results. 
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Table 5.44 Estimated means on the model scale and on the data scale for the shade-clone-tray 
interaction 


Shade*clone*tray least squares means 


Shade Standard t- Standard 
cloth Clone | Tray | Estimate | error DF value |Pr> I | Mean | error mean 
Black | СІ СН1 | 1.2821 0.1528 153 | 8.39 | <0.0001 | 3.6041 | 0.5509 
Black | СІ CH2 | 14143 | 0.1521 153 | 9.30 | «0.0001 | 4.1136 | 0.6258 
Black | СІ CH3 | 1.2201 0.1538 153 | 7.93 | <0.0001 | 3.3874 | 0.5209 
Black | C2 СНІ | 0.8131 0.1615 153 | 5.04 | <0.0001 | 2.2549 | 0.3641 
Black | C2 CH2 | 1.1486 10.1543 153 | 7.45 | «0.0001 | 3.1538 | 0.4866 
Black | C2 CH3 | 1.3376 | 0.1533 153 | 8.72 | «0.0001 | 3.8100 | 0.5842 
Black | C3 СНІ | 1.1809  |0.1548 153 | 7.63 | <0.0001 | 3.2574 | 0.5041 
Black | C3 CH2 |1.1105 | 0.1550 153 | 7.17 | «0.0001 | 3.0359 | 0.4705 
Black | C3 CH3 |1.3672  |0.1528 153 | 8.95 |<0.0001 | 3.9242 | 0.5996 
Black | СА СНІ |0.7672 | 0.1608 153 | 4.77 | «0.0001 | 2.1538 |0.3462 
Black | СА CH2 |1.4660 | 0.1517 153 | 9.66 | <0.0001 | 4.3318 | 0.6573 
Black | СА CH3 | 1.3925  |0.1523 153 | 9.14 | <0.0001 | 4.0250 | 0.6131 
Black | C5 СНІ | 1.2316  |0.1538 153 | 8.01 | <0.0001 | 3.4269 | 0.5270 
Black | C5 CH2 | 1.6090 |0.1507 153 | 10.67 | <0.0001 | 4.9979 | 0.7534 
Black | C5 CH3 | 1.4684 | 0.1515 153 | 9.70 | <0.0001 | 4.3422 | 0.6577 
Black | Pf CHI | 1.6751 0.1503 153 |1115 | <0.0001 | 5.3393 | 0.8025 
Black | Pf CH2 | 1.3126 | 0.1548 153 | 8.48 | <0.0001 | 3.7160 | 0.5753 
Black | Pf CH3 1.092 |01511 153 | 9.99 | «0.0001 | 4.5231 | 0.6834 
Pearl | Cl CHI | 0.6441 0.1741 153 | 3.70 0.0003 |1.9043 |0.3314 
Pearl |C1 CH2 |1.3602  |0.1639 153 | 8.30 | <0.0001 |3.8970 | 0.6387 
Pearl |C1 CH3 16030 0.1633 153 | 9.82 | «0.0001 |4.9678 |0.8111 
Pearl |С2 СНІ 06336  |0.1741 153 | 3.64 0.0004 |1.8844 |0.3281 
Pearl |С2 CH2 |1.2050 10.1672 153 | 7.21 | «0.0001 |3.3366 | 0.5579 
Pearl |С2 CH3 |1.5547  |0.1635 153 | 9.51 | «0.0001 |4.7335 | 0.7740 
Pearl |СЗ СНІ |0.8786 (0.1684 153 | 5.22 | <0.0001 |2.4074 | 0.4053 
Pearl | C3 СН2 |12777  |0.1646 153 | 7.76 | <0.0001 | 3.5885 | 0.5905 
Pearl | C3 CH3 | 1.5724  |0.1637 153 | 9.60 | «0.0001 | 4.8184 | 0.7889 
Pearl | C4 CHI | 0.9893 0.1680 153 | 5.89 | <0.0001 | 2.6893 | 0.4519 
Pearl | C4 CH2 |1.4198 (0.1636 153 | 8.68 | <0.0001 |4.1362 | 0.6769 
Pearl |С4 CH3 |1.4357 (0.1646 153 | 8.72 | «0.0001 |4.2026 | 0.6919 
Pearl |С5 СНІ 14557 |01631 153 | 8.93 | <0.0001 |4.2875 | 0.6992 
Pearl | С5 СН2 1.1672  |0.1696 153 | 6.88 | <0.0001 | 3.2130 | 0.5449 
Pearl | С5 CH3 | 1.6010 |01633 153 | 9.80 | <0.0001 | 4.9582 | 0.8098 
Pearl | Pf СНІ | 1.1901 0.1649 153 | 722 | <0.0001 | 3.2875 | 0.5422 
Pearl Pf CH2 | 1.4004 | 0.1643 153 | 8.52 | <0.0001 | 4.0570 | 0.6665 
Pearl Pf CH3 | 1.7623  |0.1620 153 | 10.88 | <0.0001 | 5.8260 | 0.9440 
Red СІ СНІ |1.5245 (0.1606 153 | 9.49 | <0.0001 |4.5930 | 0.7379 
Кеа СІ CH2 |1.6004  |0.1605 153 | 9.97 | «0.0001 |4.9548 |0.7953 
Red CI CH3 | 1.6327  |0.1607 153 |10.16 | «0.0001 |5.1178 |0.8224 
Red C2 СНІ |1.462 (0.1630 153 | 826 | <0.0001 |3.8430 | 0.6264 
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Table 5.44 (continued) 


Shade*clone*tray least squares means 


Shade Standard t- Standard 
cloth Clone | Tray | Estimate | error DF value |Pr> |l | Mean | error mean 


Red C2 CH2 14270 | 0.1622 153 | 8.80 | <0.0001 |4.1663 |0.6759 
Кеа C2 CH3 |1.6500 |0.1620 153 | 10.19 | «0.0001 |5.2071 | 0.8435 
Кеа СЗ СНІ |1.3915 | 0.1632 153 | 8.53 | <0.0001 | 4.0207 | 0.6563 
Red C3 CH2 | 1.4491 0.1614 153 | 8.98 | «0.0001 | 4.2592 | 0.6872 
Red C3 CH3 | 1.7875 |0.1603 153 |1115 | <0.0001 | 5.9747 | 0.9577 
Кеа C4 СНІ | 1.3961 0.1614 153 | 8.65 | <0.0001 | 4.0394 | 0.6520 
Red C4 CH2 | 1.5874 |0.1606 153 | 9.89 | «0.0001 | 4.8910 | 0.7854 
Red C4 CH3 | 1.3805 0.1635 153 | 8.44 | <0.0001 | 3.9768 | 0.6503 
Red C5 СНІ |1.6313  |0.1601 153 |10.19 | «0.0001 |5.1103 |0.8180 
Red C5 CH2 |1.6395 |0.1605 153 |10.22 | «0.0001 |5.1527 |0.8268 
Red C5 CH3 |1.6470 |01610 153 |10.23 | «0.0001 |5.1912 |0.8360 
Red Pf СНІ |1.7075 10.1600 153 |10.67 | <0.0001 5.5151 | 0.8825 
кеа Pf CH2 |1.7594 |0.1594 153 |11.04 | <0.0001 | 5.8087 | 0.9260 
кеа РЇ CH3 |1.7568 0.1601 153 |10.08 | <0.0001 5.7938 | 0.9273 
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Fig. 5.8 Effect of mesh type on the average number of leaves 
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Fig. 5.9 Effect of mesh type on the average number of leaves 


Exercise 5.3.2 Earthworms (Lubricus terrestris L.) were counted in four replicates 
of a factorial experiment at the W.K. Kellogg Biological Station in Battle Creek, 
Michigan, in 1995. A 24 factorial experiment was conducted. Factors and treatment 
levels were plowing (chiseled and unplowed), input level (conventional and low), 
manure application (yes/no), and crop (corn and soybean). The objective of interest 
was whether L. terrestris density varies according to these management protocols 
and how various factors act and interact. The data (not pooled) in the table shows the 
total worm counts (per square foot) in the factorial design 24 for the experimental 
units 64 p x 4) (juvenile and adult worms). The numbers in each cell of the table 
correspond to the counts in the replicates (Table 5.46). 


(a) Write down the analysis of variance table (sources of variation and degrees of 
freedom). 

(b) Write down the components of the GLMM. 

(c) Analyze the dataset with the model proposed in (b). 

(d) Summarize the relevant results. 


Exercise 5.3.3 This experiment involves an investigation of genotypic variation 
within cultivars of pore (Allium porrum L.) with respect to adventitious shoot 
formation in the callus tissue. The data in Table 5.47 refer to 20 genotypes of 
1 cultivar. Each genotype is represented by six calluses. These observations are 
the number of shoots per callus. The data are subject to two sources of variation, i.e., 
variation between genotypes and variation between the calluses within the 
genotypes. 
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Table 5.45 In vitro culture (Conc = concentration in ppm) 


5 Generalized Linear Mixed Models for Counts 


Conc Explant No. of outbreaks Conc Explant No. of outbreaks 
0 1 50 13 54 
0 2 50 15 35 
0 3 50 16 50 
0 4 50 17 51 
0 5 50 18 38 
0 6 50 19 61 
0 7 100 1 46 
0 8 100 2 55 
0 9 100 3 54 
0 10 100 4 49 
0 11 100 Э 55 
0 12 100 6 55 
0 13 100 7 47 
0 14 100 8 42 
0 15 100 9 38 
0 16 100 10 50 
25 1 100 11 46 
25 2 100 12 42 
25 3 100 13 44 
25 4 100 14 30 
25 5 100 15 38 
25 6 100 16 31 
25 մ 100 17 42 
25 8 200 1 36 
25 9 200 2 37 
25 10 200 3 27 
25 11 200 4 38 
25 12 200 5 25 
25 13 200 6 29 
25 14 200 7 30 
25 15 200 8 30 
25 16 200 9 37 
50 1 200 10 28 
50 2 200 11 37 
50 3 200 12 29 
50 4 200 13 36 
50 5 200 14 34 
50 6 200 15 27 
50 7 200 16 32 
50 8 200 17 37 
50 9 200 18 30 
50 10 200 19 31 
50 200 20 30 
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Table 5.46 Results of the experiment with earthworms 


Tillage 
Chisel ploughing No Tillage 
Cultivation Manure Entry level Entry level 
Low Conventional Low Conventional 
Corn Yes 5: 5, 4, 2 5,1,5,0 8,4,6,4 14, 9, 9, 6 
Мо 3, 11, 0, 0 2, 0, 6, 1 2, 2, 11, 4 15, 9, 6, 4 
Soy Yes 8,6,0,3 8,4,2,2 2224327 5,3,6,0 
Мо 8,5,3,11 2,6,9,4 7, 5, 18, 3 23, 12, 17, 9 
Table 5.47 Results of the Callus 
callus tissue experiment Genotype 1 2 3 4 5 6 
1 0 0 0 0 3 0 
2 9 0 1 5 2 4 
3 2 4 4 0 4 0 
4 1 2 5 9 0 4 
5 6 3 8 3 5 9 
6 6 2 4 4 2 7 
7 0 2 0 0 1 0 
8 1 1 3 1 0 2 
9 3 3 1 0 6 2 
10 3 6 4 7 1 8 
11 2 6 8 8 7 5 
12 0 0 3 2 10 6 
13 9 3 5 5 6 4 
14 2 3 2 0 3 2 
15 0 0 0 0 1 1 
16 5 4 4 7 7 1 
17 1 0 0 0 0 1 
18 0 1 0 0 1 0 
19 1 4 6 2 0 7 
20 4 3 5 18 4 0 


(a) Write down the analysis of variance table (sources of variation and degrees of 
freedom). 

(b) Write down the components of the GLMM. 

(c) Analyze the dataset with the model proposed in (b). 

(d) Reanalyze the dataset using the same model as above, but, now, assume that the 
data have a negative binomial distribution. 

(е) Compare and contrast the results of these analyses. 

(f) Summarize the relevant results. 
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Exercise 5.3.4 In an experiment at the Research Institute for Animal Production 
“Schoonoord” in the Netherlands, the effects of active immunization against andro- 
stenedione on the fertility of Техе1 ewes were studied (Engel and te Brake 1993). 
The number of fetuses per ewe can be considered as the net result of a process that 
determines the number of ovulations and a probability process for these ovulations to 
produce fetuses. In this study, the goals are to model and analyze (a) the number of 
ovulations and the number of fetuses in relation to Fecundin (androstenedione-7a- 
carboxyethylthioether) treatment, animal age, mating period and (b) the number of 
fetuses in relation to treatment, animal age, and number of ovulations observed. А 
summary of the experiment and a summary of the data are shown below 
(Table 5.48). 


Of the 125 Texel ewes, 63 are treated with Fecundin, whereas the remaining 
62 serve as a control group. The ewes are sorted into four age classes (e.g.,<0.5, 
0.5 — 1.5, 1.5 — 2.5, and > 2.5 years) and two mating periods (starting on October 
1 and October 22, 1986, respectively). The interactions with age are interesting and 
because it is a factor, it is easier to handle than a covariate where age was entered as a 
factor. The number of animals in the four age classes is 25, 44, 24, and 32, respec- 
tively. The age class is evenly distributed in the combinations of mating period and 
treatment groups. Ewes were slaughtered at 75—80 days after the last mating, and the 
number of ovulations and number of fetuses were determined. Ovulation numbers 
ranged from 1 to 5. For six animals, the number of ovulations was not known, so 
these ewes were excluded from the database. 


(a) Analyze the dataset using a СММ with the predictor: 
Nijkt = Ñ + Ti + aj + By (a); + (В) (та уд + Б, where т, a, and f are the 
fixed effects of treatment, age, and mating period and b is the random effect due 
to animal. Assuming that each b has normal distribution with a zero mean and 
variance 62, and under the assumption that the number of ovulations and the 
number of fetuses have a Poisson distribution. 

(b) From the analyses performed, do you observe the presence of overdispersion in 
the dataset? If so, propose an alternative distribution for the analysis for this 
dataset. 

(c) Reanalyze the dataset using the same model as before with the new data 
distribution. 

(d) Compare and contrast the results of these analyses. 

(e) Summarize the relevant results. 


Exercise 5.3.5 The following example deals with one of the most harmful insects in 
the root system of the main crops, whose common name is “blind hen.” The 
experiment consisted of six treatments formulated for larval control in a randomized 
block arrangement (A, B, C, D, E, and F). The count per area shows the number of 
larvae in two age groups (a and b) (Table 5.49). 
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Table 5.49 Results of the 
blind hen experiment 


5 Generalized Linear Mixed Models for Counts 


A B C D E F 
Tt |A |b [А |В JA |B А |b [А |b JA |b 
1 5 |7 | 5 |14 |O |3 |1 |7 |1 |10 |4 |13 
2 4 2 12 |5 |2 |3 |1 |6 |3 5 |4 1 
3 4 4 1/14 |2 |2 |1 |7 |1 8 |7 |10 
4 1 5 | 5 | 9 |2 |7 |3 |7 |0 3 3 112 
5 2. УЛ Ес 80015141 6 |1 8 


(а) Write down Ше analysis of уапапсе table (sources of variation and degrees of 


freedom). 


(b) Write down the components of the GLMM. 
(c) Analyze the dataset with the model proposed in (b). 
(d) Does the proposed model in (b) adequately describe the variation observed in the 

dataset? Summarize the relevant results. 


Appendix 1 
Data: Subcultures 
subl |Repl |МВ [subl |Кері |NB |subl |Repl ԻՑ |561 Кері NB 
1 1 18 3 2 24 6 1 45 8 9 53 
1 2 16 3 3 24 6 2 44 8 10 59 
1 3 15 3 4 19 6 3 45 8 11 57 
1 4 15 3 5 25 6 4 44 8 12 65 
1 5 11 3 6 24 6 5 52 8 13 63 
1 6 17 3 7 20 6 6 47 8 14 55 
1 7 10 3 8 24 6 7 46 8 15 50 
1 8 8 3 9 20 6 8 45 8 16 52 
1 9 17 3 10 19 6 9 48 8 17 55 
1 10 13 3 11 26 6 10 56 8 18 50 
1 11 16 3 12 22 6 11 54 8 19 53 
1 12 15 3 13 23 6 12 44 8 20 52 
1 13 12 3 14 24 6 13 54 9 1 48 
1 14 15 3 15 23 6 14 62 9 2 44 
1 15 8 4 1 24 6 15 55 9 3 54 
1 16 8 4 2 28 6 16 45 9 4 55 
1 17 15 4 3 29 7 1 56 9 5 51 
1 18 15 4 4 34 E 2 62 9 6 58 
1 19 14 4 5 24 7 3 45 9 7 47 
1 20 8 4 6 24 7 4 45 9 8 42 
2 1 15 4 7 25 7 5 46 9 9 50 
2 2 11 4 8 28 7 6 48 9 10 48 
2 3 12 4 9 24 7 7 55 9 11 48 
2 4 18 4 10 32 7 8 45 9 12 53 


(continued) 
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МВ 


54 
59 
58 
46 
38 
29 
30 
31 
33 
35 
59 
37 
44 
42. 
41 
45 
38 
40 


Кер1 
13 
14 
15 


10 
11 
12 
13 
14 
15 


subl 


10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 
10 


NB 


44 
52 
45 
43 
58 
62 
45 
63 
56 
55 
50 
53 
58 
56 
50 
57 
60 
50 
52 


Кер1 


10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 


subl 


NB 


34 
30 
26 
21 
29 
44 
54 
57 
51 
54 
51 
62. 
53 


Кер1 


11 
12 
13 
14 
15 
10 
11 
12 
13 
14 
15 


subl 


NB 


19 
19 
24 
12 
12 
11 
21 
10 
15 
20 
22 
20 
13 
18 


Кері 


10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
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Count 


Treatment 


TR 
TR 
TR 
TR 
TR 
TR 
TR 


Column 


Row 
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Data: Weed counts 


Block 


Count 


14 
14 
10 
13 
20 
53 
21 
12 
31 
32 
22 
49 
16 
14 
20 
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Appendix 1 


Count 
20 
16 
19 
31 
11 

30 
29 
25 
11 
15 

23 

22 

20 

28 

18 

18 

55 

58 

18 

19 

14 

44 
19 
17 
12 
44 
29 
49 
99 
66 
11 
15 


Block 
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Count 


21 
49 
49 
17 
22 
41 
21 
48 
11 
58 
34 
28 
20 
20 
10 
29 
22; 
22, 
31 
32 


41 
112 


24 
28 
11 
10 
117 


44 


Block 


78 
36 
38 
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9 9 r © tr © © © © © е IHO [49] v[oq 
9 [4 r r r © 0 © е IHO [49] оҷ 

И 01 9 с r 01 8 9 r с © е IHO ZO тоя 
6 8 L 8 01 8 9 r 9 T T cA IHO IO efoy 
6 I L I 8 8 9 9 r © © е IHO IO зо 
а с 1 L ç 8 9 9 9 © © е IHO 12 оч 
1 VI r r 8 8 8 9 r [4 [4 TA IHO Jd vloy 
И II ç ç 01 01 8 9 ն © 0 TA THO jd efoy 
TI ЕТ а 8 L 8 8 9 Ӯ © © TA IHO Jd тоз 
9 6 ç ç 9 01 8 9 r © [4 TA IHO со тоя 
6 II а 01 8 01 8 Ӯ Ӯ © © cH THO so юз 
6 1 ç r ç 8 01 9 9 © [4 TA IHO со vloy 
01 1 с с 9 01 8 9 с © 0 1 աշ то vloy 
L L L € Ӯ 9 9 9 9 © © TA IHO 149) vloy 
8 € © © ç 01 9 9 Ӯ [4 с TA IHO то vloy 
ç И © r 8 01 8 ғ с [4 0 TA IHO £O vloy 
SI ç ç ç 01 01 8 r r 0 0 TA IHO £O К 
ПА OIA 64 gÁ LA 9А çÁ pa çÁ СА ТА dex Келі, Əuo[O əpeus 
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42. (penunuoo) 
9 I ç ç ç 9 9 r tr [4 [4 TA THƏ [49] зо 
H L I I © 8 8 tr r © @ TA THƏ [49] оу 
VI Я! TI 01 П 01 8 9 r © © TA THƏ IO vloy 
8 6 6 П И 01 01 9 r с © TA CHO IO тоя 
ГА! 01 r 0I ç 8 9 ç ç [4 [4 IH CHO 12 тоя 
91 91 ç с 8 01 8 9 9 Ӯ © ҮЧ IHO Jd vloy 
9 € Ӯ Ӯ L 8 8 ç r © 0 D THO Jd оу 
TI TI ç © 8 ն ç 9 Ӯ 0 ҮЧ IHO Jd [оҷ 
© с с ғ 6 01 8 9 с 0 © ra IHO со vloy 
I I © ç ç 9 Ӯ Ӯ Ӯ 0 © ҮЧ IHO so vloy 
ç 01 Լ ç 9 01 01 ç Ӯ Ӯ © ҮЧ IHO so ԱԾԵԼ 
6 6 ç 9 8 8 9 ç Ӯ © © РУ IHO ՆԹ) оу 
L 9 © © ç r © © © 0 © ra IHO rO тоя 
1 1 © 1 1 © r ç ғ 0 © РУ IHO rO тоя 
с Ӯ ГА 9 ГА L 9 r © 0 © ҮЧ IHO £O vloy 
9 Ӯ 9 8 Ӯ 01 8 ғ Ӯ 0 © ҮЧ IHO £O зо 
© 1 1 1 ç r r © € 0 © ra IHO £O vloy 
6 8 9 9 9 r © [4 ré [4 [4 ҮЧ IHO ZO тоя 
€ € 9 [4 с с Ӯ © с 0 © ra IHO (49) vloy 
I 9 9 r © @ © © ru THO ZO efoy 
01 [4 [4 ç а 8 8 9 9 © 0 ra IHO TO ԱԾԵԼ 
TI TI © Ге 01 8 01 9 9 © © ra IHO լթ) оҷ 
Ӯ © ç ç © 8 8 ç r © ré ra IHO IO vloy 
E £c 8I с с 9 8 8 9 9 € € ЕЯ IHO Jd ШУ: 
8 91 а с ç 9 8 8 9 9 © © е IHO Jd vloy 
5 91 TI ա © 01 01 Ӯ с Ӯ © © е IHO Jd тоя 
= П TI 01 6 а 01 8 9 9 © © е IHO со [оҷ 


S 
ծ 21 ն B 8 8 8 8 r r © © TA CHO YO тоя 
8 8 L 6 ç © 8 8 r r 0 © TA CHO նթ) vow 
4 6 01 © 2 r 01 8 r r [4 [4 TA CHO т víow 
Է 01 01 9 9 v © © TA CHO £D (оу 
> 9 ç r 8 r r ГА 0 [4 TA CHO £O тоя 
Է 01 01 9 r 9 © 0 TA CHO £O 2051 
T © © 01 8 9 r @ 0 © TA CHO (69) ԱԾԵԼ 
Է 1 а 01 8 9 r © [4 TA THO թ víow 
= 6 8 © [4 r 01 9 9 r © © TA THO TƏ тоя 
8 ç ç ç ç 01 01 8 9 9 © [4 TA THO 12 тоя 
Е и 01 9 9 ç 8 8 9 ӯ 0 © TA 7HO IO тоя 
8 v € ç n t 8 8 9 t 0 © TA сно IO ЕП 
- 9 6 01 lI 01 L Ӯ Ӯ © TH THO jd воч 
ГА 8 4 с 9 а 8 9 9 © © TH THO jd ео 
01 6 6 а 01 01 8 9 с © © TA CHO Jd о 
91 1 с 8 01 8 8 9 9 2 0 TA сно со тоя 
SI 01 6 6 а 01 01 9 9 © © К оно со тоя 
6 01 r ç 6 01 01 9 9 © © К (нә со тоя 
и 6 6 01 01 01 8 9 r [4 [4 TA THO 72 тоя 
ФТ ФТ ФТ а 21 01 8 9 r [4 [4 TA оно rO тоя 
6 8 с 9 9 01 8 r r 0 © TH THO rO vlow 
9 9 6 01 8 9 Ӯ © © ТЯ THO £O воч 
ա ն ç 9 Il 01 01 9 9 © © ra THO £O тоя 
1 1 1 L 01 8 9 r [4 [4 TA THO £O тоя 
1 © с © 6 8 8 9 9 [4 [4 TA (но 22 тоя 
ПА OIA 64 gk LA 9А çÁ pa çÁ СА ТА dex Келі, Əuo[O opeys 
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տ (penunuoo) 
91 VI I 8 а 01 8 9 tr © © D CHO 12 о 
6 9 ç 9 8 01 8 9 r © [4 РУ THO 12 тоз 
9 с © [4 01 8 r ç r 0 с ra CHO 12 о 
УІ а L 9 ç 01 01 9 9 С T cA THƏ Jd зо 
с а І I 9 а 01 9 с ç © е THO jd ԱԾԵԼ 
и H L L и а 01 9 с ç © е THO jd vloy 
© ç ç r 9 01 8 8 9 © © е THO so КШ 
[4 I [4 € 9 01 8 r Ӯ 0 © ем CHO so о 
01 а с с 9 01 8 с 9 [4 [4 е CHO so vlow 
I I I Ӯ 8 8 9 9 @ © е THO то воч 
1 © 9 9 8 ç 9 © © е THO 149) ео 
Ӯ 8 8 9 Ӯ © © е THO ՆԹ) тоз 
П 8 1 © 01 01 01 9 9 с © eu THO £O vow 
I 01 01 01 9 9 © © eu CHO £O тоя 
01 а 8 9 Ӯ © © е THO £O vloy 
8 L с 6 8 01 8 9 r 0 © ЄМ CHO TƏ тоя 
01 а 8 с r © 0 е THO (49) v[oq 
I I ç 01 8 9 9 © (4 е THO TO vloy 
I © с 9 01 8 с r © © е CHO 12 vloy 
r I I I [4 0I 0I 9 r с © eu оно 12 тоя 
© ա 01 01 9 9 (8 © е THO լթ) ԱԾԵԼ 
8 ç r r 01 01 8 9 v с © TA CHO Jd vloy 
TI ç 9 ç 8 01 8 ç 9 ç ré TA CHO Jd vloy 
m Ӯ с 1 1 ա а 8 9 9 Ӯ e TA THO jd ԱԾԵԼ 
8 9 r с 9 1 01 01 9 9 [4 [4 TA CHO со тоя 
5 VI €T 8 01 01 01 8 9 9 © © TA THO so тоя 
= r ç ç 9 9 8 8 9 r © 0 TA THO со [оҷ 


S 
Š ç ¿ 9 ¿ 6 8 8 r n z z ոլ օթ Р "он 
8 ФТ II 6 L 6 01 8 9 r © © TA €HO £O vow 
2 01 £ £ s S 8 8 ? ն 0 с TH 325) £O “он 
% 8 с 9 01 01 8 6 9 r [4 [4 TA €HO £O тоя 
> 01 6 6 6 8 8 L ç ç © [4 TA €HO TƏ тоя 
E [4 [4 Ӯ ғ 01 8 9 с с 0 © TA €HO թ (оу 
T 8 6 6 6 8 8 9 9 r 0 [4 IH €HO TƏ тоя 
8 1 с ç [4 L 8 9 r ré [4 [4 TA €HO IO víow 
= r L L 9 01 8 8 9 ç [4 [4 TA €HO 12 тоя 
8 ЕТ r r 1 6 01 1 9 r 0 с TA £HO 12 тоя 
Е 02 21 01 01 6 01 01 ն ն € © РУ 7HO jd тоя 
8 m 9 L L 6 01 8 9 t 0 © vu сно м ЕП 
- I ç ç с 8 8 8 с Ӯ ç © ҮЧ THO jd зо 
1 1 8 1 01 01 8 8 9 [4 [4 уч THO со тоя 
r ç I © 9 8 8 r r 0 [4 va THO со тоя 
01 1 1 L 8 8 01 9 9 © 0 РУ CHO so тоя 
8 Ӯ 9 6 6 01 8 9 Ӯ 0 © ҮЧ THO 149) vloy 
а L 9 L 01 01 8 9 9 © © ҮЧ THO 149) vloy 
01 ç L 8 6 8 01 9 9 [4 [4 va THO YO тоя 
8I 8 6 6 и 6 01 9 9 [4 © va оно £O тоя 
6 1 6 6 а 01 8 9 ғ 0 © ҮЧ THO £O тоя 
9 01 ç ç 8 8 8 9 9 © © ՖՎ THO £O тоя 
Y Y 9 9 8 8 9 Ӯ Ӯ с 0 ҮЧ THO (69) зо 
6 01 9 r ç 8 8 9 ç 0 [4 va THO TƏ тоя 
ЕТ 1 ç r 8 8 8 9 9 [4 [4 va оно TƏ тоя 
ПА OIA 64 gÁ LA 9А çÁ pa çÁ СА ТА day Келі, Əuo[O opeys 


188 


(penunuoo) vjep əəJjJoO 


189 


Appendix 1 


(pənunuoo) 

8 а 6 01 01 01 8 9 с © © е €HO 12 зо 
0c VI 01 6 01 01 8 9 r © [4 TA €HO jd оу 
LI ç 01 6 8 01 8 9 r 0 © TA €HO jd [оў 
УІ SI а II 01 01 8 9 ն © © TA €HO jd ԱԾԵԼ 
а 6 8 0I 0I 0I 8 9 9 [4 [4 TA €HO so тоя 

8 8 8 9 r © © TA €HO so vloy 

SI а 8 8 а 01 01 9 9 © 0 TA €HO so К 
01 01 ç 9 8 8 8 9 Ӯ © © TA €HO ՆԹ) о 
@ © 8 6 9 9 © © TA €HO то vloy 

L 8 8 9 Ӯ © © TA €HO 149) юз 

91 а 8 H 6 01 01 8 9 © © TA €HO £O [о 
© 1 П а 01 01 01 9 9 1 © TA €HO £O оу 

ç ç ç I 8 8 01 9 r © © 1 €HO £O тоя 

6 9 8 8 ғ ғ 0 © TA €HO ZO тоя 

8 8 6 8 8 9 © © © © TA €HO (69) vloy 

6 а I lI а 01 8 9 Ӯ © © TA €HO (69) зо 
LI VI L L а 01 8 9 ғ 0 © TA €HO լթ) vloy 
а 01 6 01 с 8 8 9 r 0 © 1 €HO IO тоя 
01 r r ç 01 8 8 9 r 0 © TA €HO 12 vloy 

[A I Ӯ Ӯ 9 01 8 9 Ӯ © © ТЯ €HO jd vloy 
01 9 8 ¿ 8 01 6 9 ա © © TA €HO jd ԱԾԵԼ 

9 ç ç 9 01 01 8 9 9 © 0 TA €HO jd vloy 

Ӯ ç 01 8 9 9 0 0 TA €HO «2 vloy 

6 01 L ç ç 8 9 r r © © TA €HO со vloy 

€ € с 9 6 0I 8 9 r [4 [4 IH €HO со тоя 

© L 01 8 8 9 Ӯ © © TH €HO 149) тоя 
8 L 8 6 r ç 0 © TA €HO ՆԹ) [оҷ 
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EZ 91 9 6 ç 8 L ç ç с © ra €HO £O vloy 
[44 gI 9 L И 01 8 9 9 @ © РУ €HO £O vow 
I r 9 01 L 9 9 [4 [4 уч €HO TƏ víow 

9 0I 9 01 и 6 8 9 r 0 © va €HO TƏ тоя 

8I 9I 8 6 а 01 8 9 r 0 [4 va €HO TƏ тоя 
8 9 1 с 01 8 8 9 r 0 © ra €HO 12 vloy 

L Га 9 01 9 а 8 с Ӯ © © ҮЧ €HO IO vloy 

9 € € r 8 8 8 9 9 © © ҮЧ €HO 12 v[oq 

VI 9 [4 ç 9 01 8 9 Ӯ © © е €HO jd elfog 
VI 01 1 © ç 9 9 9 ն © © Е €HO jd оҷ 
© 1 8 01 ç 9 9 9 ն © © е €HO jd vloy 

ç Ӯ 01 8 с с © © е €HO so vloy 

r 01 1 8 а 01 8 9 r © © cA €HO со тоя 

LI © © [4 r 01 8 9 r © [4 е €HO со тоя 
с © © 9 9 8 9 r © z е £HO rO тоя 

© 1 € € ç 01 8 9 r © © eu €HO նթ) тоя 

Ӯ 8 L с Ӯ с © е €HO то vloy 

II I ն 8 L а 9 9 ғ © © е €HO £O зо 
П ç 9 8 r II 8 9 9 © © е €HO £O тоя 
21 и ЕТ ЕТ и 01 8 9 ç [4 с eu €HO £O тоя 
8 8 9 r € [4 [4 ЄМ €HO (49) vloy 

SI 6 6 TI 01 8 8 Ӯ Ӯ © © е 13210) (69) vloy 
I I 9 8 6 Ӯ 9 [4 © е €HO (69) зо 

8 [4 9 а 6 с с [4 [4 е ЕН2 12 тоя 

8I € с 9 01 01 6 с r [4 [4 ЄМ ЕН2 IO тоя 
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9 r I РА © 9 8 9 r [4 [4 ռլ IHO Jd epod 
ФТ ç [4 r 9 r 9 0 [4 TA THO Jd epod 
vI r 1 r 6 6 L ç r © [4 TA THO 2 epod 
а 9 1 8 4 r r © © TA IHO «2 epad 

8 r I [4 r 9 с 9 r с [4 IH IHO so epod 

€ ç 9 1 с r © [4 TA THO roO Եր» 

ç ç ç ç 9 ç r ç © TA THO 72 epod 

1 r L [4 с [4 0 [4 TA THO YO epod 

9 T 1 © с r L ç r 0 © TA IHO £O epad 
r I © ç © r [4 [4 IH IHO £O epod 

T 9 8 ç r [4 [4 TA THO £O epod 

r r с ç 0 [4 TA THO TƏ epod 

r r r 9 © © TH THO TO epad 

1 1 © © 9 ç r © © 0 © TA IHO (49) epad 
ա [4 [4 z 0 © IH THO 12 epod 

I ç 7 ç r 0 © Ta THO 12 epad 

1 © © € r 0 [4 TA THO 12 epod 

ЕТ 6 L 6 9 9 9 9 ç 0 0 ҮЧ €HO jd [оч 
£c H II 01 а 01 8 9 ն © © ՖԱ €HO jd точ 
vc 12 01 01 а 01 8 с ғ © © ҮЧ €HO jd воч 
91 ФТ 6 01 01 01 8 8 9 0 [4 va €HO со víow 
ФТ 21 01 с 1 01 01 8 9 РА © va ЕН2 6) тоя 
6 ФТ 01 6 ç 8 01 8 9 © с va €HO со тоя 
© с ç 9 01 6 8 9 1 e ba €HO то ԱԾԵԼ 
Ӯ с а 01 01 8 9 © © ҮЧ €HO 149) vloy 

и L r r 9 а 8 1 r [4 [4 уч €HO rO тоя 
ЕТ ГА! и 01 01 01 8 9 9 [4 [4 va €HO £O тоя 
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r r 9 r r [4 [4 єч IHO TO epod 

€ 8 8 9 ա Շ Շ ЕЯ IHO TƏ epad 

6 T r L 9 [4 [4 ЕЯ THO TO epad 

1 9 v r v 0 [4 єч THO IO epad 

1 T T ç r T T T T єч IHO IO epad 

1 1 1 r L 8 9 9 T © ЕЯ IHO IO рә 

I [4 r r 8 9 6 [4 [4 TA ІНО м epad 

8 € T 6 L 6 6 r [4 [4 ca THO jd epad 

1 T 6 6 r r € T TA IHO Jd ерәа 

ç T T r ç r € ç 0 T TA THO SD epad 
ç € € ç L 9 9 9 T T TA THO ЧӘ) рә 
r Е S r 6 r r r [4 [4 TA ІНО ЧӘ) epad 
T T T € 9 r I r [4 [4 TA ІНО 9) epad 
H € 1 r v 6 r r 0 [4 ca THD 9) epad 
H 8 I 1 L 9 9 r T T TA THO 119) ерәа 
ç ç с T r r € r 0 [4 TA IHO 59) epad 
9 S [4 r 9 8 r r [4 [4 TA ІНО 59) epad 
6 9 I € [4 r r r [4 [4 TA ІНО 59) epad 
€ [4 T 1 [4 0 [4 ca THO TO ерәа 

ç 8 T T T 0 0 ca THO TO epad 

8 9 r € € 0 0 ca THO TO epod 

T T r € r [4 [4 TA ІНО IO emd 

8 8 8 9 r [4 [4 TA ІНО IO emd 

9 8 [4 € r [4 [4 ca THO 12 epad 

€ € L 9 9 T T TA IHO Jd ерәа 

LA 9А çÁ ֆճ çÁ zÁ ТА doy Kei] Əuo[O epeus 
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e. (penunuoo) 
ЕТ 8 ç ç ç L L ç Ӯ С © РУ IHO so epad 
ç ç ç r 9 8 L Ӯ Ӯ © [4 РУ IHO so epad 
6 8 L 8 8 8 6 L 1 r © ra IHO со epad 
r с 1 € L ғ ғ [4 [4 РУ IHO նթ) epod 
ç 9 Ӯ ç 0 [4 уч IHO rO epod 
а 6 с Ӯ Ӯ © © © 0 © ҮЧ IHO 149) epad 
Ӯ ç ç ç 0 © ҮЧ IHO £O epad 
ç © r © [4 0 [4 ra IHO с epad 
9 £ r r 0 © ra IHO £O epod 
© © с с r Š r r [4 [4 уч IHO TƏ epod 
Ӯ Ӯ ç ç ç 9 r tr 0 © ra IHO (49) epad 
T [4 [4 [4 0 [4 РУ IHO [49] epad 
9 Ӯ 9 € ғ 0 © уч IHO IO epad 
€ ç € ғ 0 © РУ IHO 12 epod 
r ա Ӯ Ӯ r с © уч IHO IO epod 
I I ç Ӯ с с 9 Ӯ © е IHO Jd epad 
TI 01 9 9 9 L L 9 ç Ӯ © е IHO Jd epad 
ç ն ç L 9 © © ed THO jd epad 
1 9 € € ç € L с ғ r © е IHO со epad 
01 9 © ç r ç 8 9 r 0 © eu IHO so epod 
а с 1 1 ç 6 L 9 tr © © е IHO so epad 
T ç © @ © ç 9 8 r © © е IHO то epad 
8 ç ç ç L L r © © е IHO ՆԹ) epad 
E 6 ç I I € ғ 6 с 9 © Е eu THO նթ) epod 
Е 01 [4 [4 I с r 8 ç r [4 [4 eu IHO £O epod 
5 6 8 ա ç ç 9 9 Ӯ 0 © е IHO £O epad 
= 9 r © © © ç ç ç 0 © е IHO £O epad 


g 
ծ 01 8 8 8 9 r 0 [4 TA CHO TƏ emd 
8 [4 І І [4 ç 8 01 9 r [4 [4 TA CHO IO epod 
= 01 1 1 1 Е с 01 9 r < < TA CHO IO e[rod 
Е 1 r 01 8 01 9 9 < < ca CHO ID enad 
> ЕТ 6 9 9 IT L 01 9 r [4 [4 TA CHO jd epod 
E 6 L 1 (4 9 r 8 9 r [4 [4 TA CHO jd epad 
T II S I I [4 6 9 r r [4 [4 TA CHO jd epod 
Е IT 8 01 9 9 0 < TA CHO Кө] e[rod 
a 1 L с 9 r T T TA CHO МӘ) epod 
Е а L I r 01 8 9 r 0 [4 TA CHO ЧӘ) epod 
Е € а 01 01 9 9 [4 [4 TA CHO vO epad 
8 9 с r [4 [4 [4 T TA CHO 9) epod 
~ SI УІ 01 01 01 8 8 r r 0 (4 TA CHO 9) epod 
8 8 r r T T TA CHO €D epod 
9 8 8 9 r [4 [4 TA (4:0) 35) epod 
6 ç r r 8 9 8 9 r [4 [4 TA CHO 59) epad 
IT а 01 6 6 9 L r r 0 [4 TA CHO TO epod 
1 1 П 6 r 8 8 r r 0 [4 TA CHO co epod 
9I SI 1 а 6 01 01 9 9 < < TA CHO co e[rod 
ç 8 8 r r [4 [4 TA THƏ TIO epad 
6 8 8 9 r [4 [4 TA CHO IO epad 
а а а [4 6 6 8 9 r (4 (4 TA CHO I2 epod 
r (4 € r [4 8 9 9 [4 [4 уч IHO Jd epod 
r I I I [4 r 9 r r 0 [4 ҮЧ THO jd epod 
01 1 [4 [4 € r 9 ç r 0 [4 ҮЧ THO jd epod 
ПА OIA 64 gÁ LA 9А çÁ pa çÁ СА ТА day Келі, Əuo[O opeys 
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П ç ç ç 01 8 8 Ӯ Ӯ 0 © е THO so epad 
9 ç I I ç r 8 9 r 0 [4 е CHO ՆԹ) epad 
9 ç I [4 I 8 8 8 r © © е CHO ՆԹ) epad 

© © 6 L 8 8 ғ [4 [4 eu CHO YO epod 

6 ç Ӯ 8 8 8 Ӯ Ӯ 0 © е THO £O epad 
9 T T [4 6 ç L ա Ӯ 0 © е THO £O epad 
9 ç ç ç ç 8 8 Ӯ Ӯ © © е CHO £O epad 
T 9 8 9 Ӯ © © е CHO TO epod 

L 8 ғ r 0 0 eu (нә ZO epod 

L L ç r 0 © е THO (69) epod 

9 ա © ғ 9 8 8 9 9 © © е THO 12 epod 
6 8 T ç ç 8 8 9 r © © е THO 12 epod 
€ с с 1 8 9 Ӯ 0 © cA THO 12 epod 

9 ç I € 9 01 и 6 9 ç © TA THO jd epod 
1 1 І І ç Ӯ 01 9 r c © TA THO jd epad 
9 Y с 9 9 6 01 9 Ӯ 0 © TA THO jd epad 
1 6 8 8 r r I [4 TA THO со epad 

ç ç 9 01 9 Ӯ © © TA THO so epod 

а 6 1 9 01 8 01 с ғ [4 0 1 CHO со epod 
L с Ӯ Ӯ ГА 01 8 L Ӯ 1 © TA THO 149) epod 
6 L L 9 01 01 01 9 9 0 © TA THO 149) epad 
6 9 8 8 8 6 6 9 9 c [4 TA CHO ՆԹ) epad 

П 6 L 9 6 8 8 9 r [4 ré TA CHO £O epad 

01 8 ç 9 8 01 01 9 9 © ré TA THO £O epad 
ç ç Ӯ с 6 8 01 9 Ӯ © © TA THƏ £O epad 

1 9 9 9 8 01 8 9 Ӯ © © TA THO (49) epad 

L ç L 6 L 9 v 0 © TA THO (49) epad 
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6 L 8 9 [4 [4 [4 TA Но epod 

L 8 8 9 r 0 [4 TA £ HO epad 

с 8 01 8 9 r < ҮЧ CHO e[rod 

L с 8 9 r < < ҮЧ CHO e[rod 

r 8 8 9 r [4 [4 ҮЧ <HO epod 

6 8 8 9 ç r [4 [4 уч CHO epad 

I r [4 9 r [4 [4 уч CHO epod 

T с с 6 9 r 0 < ҮЧ CHO e[rod 

€ € r 9 L 9 r T T ҮЧ CHO epod 

T T 8 8 8 9 r [4 [4 ҮЧ (4:0) epod 

1 € L ç ç r r [4 [4 уч CHO epad 

I I L 01 9 r [4 [4 ҮЧ CHO epod 

8 9 r [4 [4 (4 уч CHO epad 

ն r r < 0 T ҮЧ <HO epod 

ç L 9 r 0 0 ҮЧ (4:0) epod 

T 9 8 9 r 0 [4 ҮЧ CHO epod 

9 I I I 9 € 9 r 0 [4 va CHO epod 
6 І I I € с L r [4 [4 уч CHO epod 
r © < Е 6 01 9 r < [4 ҮЧ CHO e[rod 
[4 r ç ç ն [4 ç r 0 [4 ҮЧ (4:0) epad 
ç I r 8 8 r [4 [4 cu CHO epad 

I с 9 8 9 r (4 (4 ЕЯ CHO epod 

r r 6 [4! L r [4 [4 ca CHO epod 

T 6 01 8 9 r 0 [4 ca CHO epod 

ç 6 6 9 [4 [4 0 [4 ca CHO epod 

gÁ LA 9А çÁ pa çÁ СА ТА doy Келі, opeys 
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9 6 01 01 01 8 8 Ӯ Ӯ 0 © TA €HO 149) epad 

6 VI 01 8 6 L 8 Ӯ Ӯ © [4 TA €HO ՆԹ) epad 
91 01 6 6 01 8 8 r r 0 © TA €HO £O epad 
01 6 9 9 01 8 8 Ӯ Ӯ 0 © TA €HO £O epod 
VI а 01 01 01 8 8 9 Ӯ © © TA €HO £O epad 
SI 01 01 6 01 8 8 Ӯ Ӯ 0 © TA 13510) (49) epad 

6 9 ç L L L L Ӯ ç 0 © TA €HO [49] epad 
91 а 8 6 с 8 01 9 Ӯ Ӯ © TA €HO TO epad 
91 01 8 01 01 8 8 ғ r 0 © TA €HO IO epod 
VI 6 8 8 ç с 6 9 Ӯ С © TA €HO IO epod 
VI 01 1 а а 01 9 9 ա © © TA €HO 12 epod 

ç ç ç ç 6 8 8 9 © © © TA €HO jd epod 
02 ФТ 9 1 а 01 8 9 r с © TA €HO jd epod 
LI И 1 01 9 8 8 9 ғ © © TA €HO jd epod 
II 01 Ӯ с 6 8 01 9 Ӯ с © ТЯ €HO so epod 
11 П и Il “а 0I 0I 9 r © © ТЯ €HO so epad 
91 01 8 а И L 01 9 r 0 © TA €HO со epad 
а Ӯ € r 9 с 9 Ӯ © © © TA €HO YO трәд 

8 © 9 8 8 8 ғ ғ © © TA €HO նթ) epod 

8 9 с с 8 9 8 9 Ӯ 0 © ТЯ €HO 149) epad 
а 8 4 L П 6 8 9 Ӯ 0 © TA €HO £O epad 
01 8 8 01 01 6 8 9 r 0 © TA €HO с epad 

r © ç ç H 01 8 9 r © ré TA €HO £O epad 
II 01 6 8 6 ç 8 9 r © Е TA €HO ZO epod 
II I 01 6 6 8 8 9 Ӯ © © ТЯ €HO (69) epad 
ЕТ TI П 6 01 9 8 r Ӯ © © TH €HO (49) epad 
91 П 01 а 6 01 L 9 r 0 © TA €HO 15 epad 


g 
ծ SI II 01 01 8 6 8 9 r [4 [4 єч £ HO Jd epod 
8 9c [4! ç ç 8 01 8 9 r [4 [4 єч £ HO Jd epad 
= 81 УТ 9 L 6 8 8 9 9 r T ЕЯ €HO jd e[rod 
Է 91 IT r 01 ն 8 8 9 r < T ca 217 So enad 
> а 6 IT IT а 01 01 8 r 0 [4 ca €HO 2 epod 
E 61 01 6 L 8 9 8 r [4 0 [4 єч ЕНО ЧӘ) epad 
T ЄТ 8 с 9 6 L 01 r r 0 [4 ca НО 9) epod 
Է 9 01 8 8 9 r 0 < ca €HO 9) e[rod 
a r I (4 9 01 8 8 r r © T ca Но 9) e[rod 
Е 9 [4 I [4 L 8 01 9 r 0 [4 ca €HO 35) epad 
Е 1 € 6 ç 8 9 r [4 [4 єч ЕНО 59) epad 
8 6 8 8 8 01 8 01 9 r [4 [4 ca НО [29] epod 
~ L с 8 L 8 9 r [4 [4 0 (4 ca НО TO epod 
9 r r r 8 9 r T T 0 T ca €HO co epod 
с с € € € € [4 [4 [4 0 [4 ca €HO TO epod 
r ç 8 9 8 8 8 9 r [4 [4 єч £ HO IO epad 
УІ 6 r T T L 8 9 r 0 [4 ca Но IO epod 
1 9 © с r 9 8 9 r [4 (4 ca Но I2 epod 
81 УІ r 1 с L 8 9 r 0 © TA Но м epad 
(44 ЄТ а а 01 01 8 9 r [4 [4 ca €HO м epad 
Ic L ç 9 8 8 8 9 r 0 [4 TA ЕНО м epad 
8 6 r с 01 8 8 9 r 0 (4 TA Но 49) epod 
9I L r с а 01 8 9 r 0 [4 TA Но 49) epod 
IT 9 € L r 01 8 9 r 0 [4 TA €HO 2 epod 
L 6 6 а OT 8 9 r [4 [4 TA €HO то epod 
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IT 6 r ç ç 9 Ӯ ç © 0 © TA IHO £O BISON 
6 € r Ӯ ç 8 9 ç Ӯ © [4 TA IHO £O BISON 
01 9 © ç L 8 9 r r © © TA IHO с PRISON 
L ғ 8 6 ғ © [4 © 0 0 TA IHO ZO BISON 
Ӯ с 9 Ӯ Ӯ © [4 [4 0 0 IH IHO cO BISON 
9 L ç Ӯ 8 9 Ӯ © 0 © TA IHO (49) BISON 
8 8 9 r 0 © TA IHO IO BISON 
8 9 9 9 d [4 TA IHO 12 BISON 
8 9 9 r [4 T TA THO I2 BISON 
vl 01 r ç ç r 9 ç ç r [4 уч €HO jd epod 
8 ա © © ա ç L ç ç 0 © ra €HO jd epad 
0c 9T I I r 9 8 9 r © © РУ €HO jd epad 
8 L r ç ç ç 9 r ғ © © уч €HO so epod 
ç © 1 1 r 8 ç ç © © РУ €HO so epod 
ç с 6 Ӯ 9 © © уч €HO so epod 
I ç 9 8 9 9 0 © уч €HO նթ) epod 
1 1 r 8 9 r © © ҮЧ €HO ՆԹ) epad 
L с 1 1 © r 9 r ғ © © ҮЧ €HO նթ) epod 
8 ç L I I 8 9 9 r 0 © ra €HO £O epod 
Il r r r ç 1 8 9 r 0 © уч €HO £O epod 
1 6 8 8 9 tr © © ҮЧ 13510) £O epad 
9 ç ç 9 Ӯ [4 9 r r 0 © ra €HO (49) epad 
VI TI ç 9 9 8 9 9 ç 0 © ra €HO [49] epad 
E L € ç ç 6 8 L ç с © [4 ra €HO (49) epad 
8 9 8 с с с с 8 9 9 0 © уч €HO 12 epod 
5 8 L I © © L 01 9 ա 0 © ra €HO 12 epad 
= I I ç ç L 9 r © © ra €HO լթ) epad 
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ЕТ II © 8 9 9 ա © © TA IHO Jd VISON 
9 L 8 9 9 [4 [4 1 IHO «2 BISON 

ա 8 9 9 9 © © cH THO so BISON 

© 1 Ӯ 9 9 Ӯ Ӯ 0 © TA IHO so PRISON 

[4 [4 [4 [4 r 0 © TA IHO ՆԹ) BISON 

9 9 Y © © 0 T TA IHO YO BISON 

Ӯ Ӯ 9 Ӯ Ӯ © © TA IHO 149) BISON 

© 1 9 8 9 9 9 © © cH THO £O BISON 

€ 9 9 © © 0 © TA IHO £O BISON 

9 r r 9 9 © © TA IHO с BISON 

8 8 9 r T T TA IHO TƏ BISON 

r 8 9 Ӯ Ӯ © © TA IHO (69) BISON 

Ӯ Ӯ Ӯ 9 Ӯ 0 0 TA IHO (69) VISON 

9 9 9 r 0 © TA IHO լթ) vISON 

€ 8 9 9 r [4 [4 TA IHO 12 BISON 

6 L ç 8 8 8 9 9 © © TA THO IO VISON 

91 6 Y 8 8 8 9 9 [4 T 1 IHO Jd BISON 
сс 01 9 $ 9 9 9 Y r T T 1 IHO Jd BISON 
€T L ç 8 8 8 9 9 © © TA IHO Jd VISON 
8 9 r © 0 0 TA IHO со BISON 

8 9 Y Y T T TA THO so BISON 

8 r 8 8 9 Ӯ © © ТЯ IHO со BISON 

8 8 9 Y © T 1 IHO YO BISON 

9 9 r v 0 © TA IHO ՆԹ) BISON 

r 9 9 r 0 © ռլ IHO ՆԹ) PRISON 
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(pənunuoo) 


П ç © © ç 9 8 9 tr [4 [4 РУ IHO £O BISON 
I r 9 8 9 r © @ РУ IHO [49] BISON 

© © © [4 r 9 8 9 9 © © ra IHO [49] PRISON 
9 8 8 ç r © © ra IHO TO BISON 

а i I Ӯ I Ӯ 8 9 9 © © ҮЧ IHO IO vISON 
6 € € ç ա 8 9 Ӯ 0 © ҮЧ IHO 12 BISON 
01 ç ç r ç 9 8 9 9 © © РУ IHO 12 BISON 
81 81 1 © ն 9 9 9 9 ն © ЕЯ IHO Jd BISON 
LI ЕТ 9 9 9 8 8 9 9 r © е IHO Jd VISON 
8I 01 9 9 8 01 8 9 9 с © е IHO Jd VISON 
01 L I ա 8 6 8 9 9 © © eu THO so BISON 
[4 I ç Ӯ c 01 8 9 9 © © е IHO so BISON 
0c gI 9 9 8 9 9 © © е IHO «2 VISON 
I Y Y 9 r 9 [4 T cA IHO YO BISON 

© $ 01 8 9 9 © T cA IHO YO BISON 

€ ç 9 8 9 Y © © ЕЯ IHO YO BISON 

01 L I с 9 8 9 r © © е IHO £O BISON 
Ӯ 1 © 9 01 8 9 9 © © eu THO £O BISON 
© с 9 8 9 9 © © е IHO £O BISON 

€ 8 Y Y Y 0 © eu IHO cO BISON 

I ա © [4 © с 0 © е IHO (49) BISON 

8 r I ç r 8 9 © 0 © е IHO [49] BISON 
VI L r 9 9 01 8 9 r 0 © е IHO IO PRISON 
SI r 9 ç L 01 8 9 9 © [4 е IHO 12 BISON 
6 6 6 6 I 01 8 9 9 [4 T cA IHO IO BISON 
I ç © ç r 8 L 9 ա © cH THO Jd BISON 

01 L Ӯ € r 9 8 9 9 r © TA IHO Jd PRISON 
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6 01 L r © © TA CHO со BISON 

L 1 ç 01 8 9 9 [4 [4 TA CHO so VISON 
ç Ӯ 1 01 8 01 tr 0 © TA THO 149) BISON 
8 6 L ç 8 9 9 © © TA THƏ ՆԹ) PRISON 
€ © ç L 8 9 r © © TA THƏ ՆԹ) BISON 
8 9 ғ © © TA CHO с BISON 

9 8 9 Y 0 © [| CHO 50) BISON 

ç 8 9 9 0 © TA THO £O BISON 

а 1 © r 8 Ӯ © © © 0 TA THO [49] UISON 
6 6 ç ç 9 9 r Ӯ © © TA CHO [49] BISON 
ç [4 [4 L 8 8 9 r @ 0 0 TA CHO TO BISON 
с I I I 6 8 8 9 9 © © ТЯ THO IO vISON 
8 Ӯ ç ç 6 L 8 9 Y 0 [4 1 CHO IO BISON 
01 L ç ա 01 9 8 9 r © © TA THO լթ) vISON 
VI 01 9 ç r r r r 9 © © ra IHO Jd BISON 
И 8 1 1 9 9 8 ç 9 с © ҮЧ IHO Jd VISON 
9T I Ӯ с 9 І І 9 9 Ӯ © ҮЧ IHO Jd BISON 
01 L Ӯ Ӯ с 9 8 9 9 © © ҮЧ THO so VISON 
H ç ç ç ç 8 8 ç 9 © © ra IHO so vISON 
SI 8 r r 8 9 9 r 9 0 © ra IHO so PRISON 
8 6 9 8 9 Ӯ © © ra IHO то BISON 
ç © 9 9 Ӯ Ӯ © © с 0 © ҮЧ IHO YO VISON 
Ӯ ç G © ç ç 8 9 9 © © ՖԱ IHO YO BISON 
9 ç ç ç ç с 8 r Ӯ 0 © ra IHO £O PRISON 
© © ç ç 6 с 9 9 r 0 © ra IHO £O BISON 
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(pənunuoo) 
© 9 9 8 9 tr 0 © е THO [49] тоқ 
@ ç с 8 8 9 r 0 [4 е THO [49] BISON 
а 9 © ç 01 8 8 9 r © © е CHO 12 PRISON 
Ֆլ 8 ç L 01 r 01 9 9 © © е CHO 12 BISON 
а rd ç ¿ 1 8 8 9 9 0 [4 ЄМ CHO IO BISON 
9 L 8 9 Ӯ 0 © TA THƏ Jd BISON 
€ 01 8 9 r 0 © TA THƏ Jd BISON 
0c 01 6 8 1 01 9 9 ն © © TA THO jd BISON 
01 8 8 r © © TA THƏ со BISON 
I 01 8 01 а 01 8 9 9 © © TA THO so BISON 
L L 9 L 6 01 8 9 9 © © TA THƏ so BISON 
9 L 01 01 L 01 01 9 9 © © TA THƏ ՆԹ) BISON 
r L L 01 6 L 8 9 9 0 © TA CHO նթ) VISON 
© 9 1 01 6 8 ғ ғ 0 © TA CHO YO VISON 
01 8 8 9 Ӯ © © TA THƏ £O BISON 
r @ 01 8 9 Ӯ Ӯ 0 © TA THƏ £O VISON 
€ 9 8 9 r © © TA THO £O BISON 
ç 8 9 Ӯ Ӯ © 0 TA THO ZO VISON 
01 r L 6 6 L 8 r r © © 1 CHO (49) BISON 
9 8 8 с 8 9 9 © © TA THO TO vISON 
VI L © [4 9 8 9 9 Ӯ © © cH CHO լթ) BISON 
© с © С 8 6 8 9 9 с © TA CHO 12 BISON 
8 € 8 8 8 L 8 9 r 0 ré TA THO լթ) BISON 
а c € € 9 9 8 9 9 c fé TA THƏ Jd BISON 
18! С I ç Ӯ 9 8 9 9 ç © TH THO jd BISON 
01 6 tr ա ç 9 8 9 r 0 © TH THO jd BISON 
© 9 9 9 L 01 01 8 r © © TA THO со PRISON 
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ç © r 9 8 r 8 ғ 9 © © ҮЧ CHO YO VISON 

r © € r L 01 8 9 9 0 © ra CHO то BISON 

I € 6 8 9 9 9 © © ҮЧ THO 149) BISON 

r ç ç 8 © 9 tr r 0 © РУ THO £O PRISON 

ç 9 8 9 9 © © ra CHO с BISON 

ré 8 8 9 9 [4 [4 ra CHO £O BISON 

© © 8 9 9 © © ҮЧ THO (69) BISON 

I © ա r r 0 © ҮЧ CHO (49) BISON 

L Ӯ Ӯ Ӯ ç ç 9 Ӯ © 0 © РУ CHO [49] UISON 

€ 8 9 9 [4 [4 ra CHO լթ) BISON 

© @ L 8 8 9 9 с © ra CHO 12 BISON 

8 8 9 9 © T ҮЧ CHO լթ BISON 

а 01 Ӯ 6 6 8 8 8 9 0 © е THO jd BISON 
8T ЕТ с 9 8 01 8 9 Ӯ 0 © е CHO Jd VISON 
VI 01 L П а 01 8 9 r 0 © е CHO Jd PRISON 
gI 6 9 8 8 а 8 8 9 © © е CHO со BISON 
L 8 8 8 Ӯ Ӯ с © е THO so BISON 

11 6 ç ç I 0I 8 9 r © © е оно so BISON 
L © 1 © L 9 9 9 r © © е THO то vISON 

L 01 8 9 9 © © eu THO ՆԹ) PRISON 

ç ç r 9 L r 8 9 9 © © е CHO то BISON 

9 Y Y Y 8 9 8 9 r © © е THO £O [БЕМ 

а 8 С с с Ӯ 9 Ӯ Ӯ [4 © е THO £O VISON 
VI а ç r 6 01 8 9 9 © © е CHO £O PRISON 
fé r L ç 9 9 Ӯ [4 © е CHO [49] BISON 
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(pənunuoo) 
9 с ç ç L 9 01 9 9 с © TA €HO 12 тоқ 
с 1 r r 9 9 8 9 r 0 [4 TA €HO լթ) BISON 
[4 8 01 8 9 r 0 © TA €HO լթ) PRISON 
L I € ç 8 01 8 9 r 0 © TA £HO Jd BISON 
01 01 01 L L 01 8 9 9 © © ТЯ €HO jd VISON 
6 ç 6 ГА 6 8 9 8 9 © © TA €HO jd BISON 
а 01 Լ 9 9 ç 8 8 9 © © TA €HO so PRISON 
6 6 L 8 6 6 8 9 9 с © TA €HO «2 BISON 
€ r L 8 6 01 8 9 9 © © TA €HO so VISON 
lI I 6 8 6 9 L 9 Ӯ © © ТЯ €HO 149) BISON 
9 9 8 9 Ӯ Ӯ © © TH €HO 149) BISON 
6 L 9 8 6 01 8 9 r r © TA €HO ՆԹ) BISON 
9 И 01 01 8 01 8 9 r 0 © TA €HO с BISON 
91 1 6 L 8 8 9 9 r © © TA £HO £O BISON 
L [4 ç ç r 01 8 9 9 © © IH €HO £O BISON 
с I 9 8 9 9 © Ӯ с 0 0 ТЯ €HO ZO BISON 
SI а 6 а а 8 8 9 9 © © TA €HO (49) BISON 
01 01 6 01 01 8 9 r 9 © © TA €HO 5 VISON 
Ӯ ա 9 9 r r r r © © TA €HO IO BISON 
9 9 ç ç ç 9 8 9 9 [4 T TA 320) IO BISON 
I © L 6 8 9 9 0 © TA 13510) TO BISON 
С ç 01 8 9 r [4 [4 ra CHO Jd BISON 
8 8 9 r © ré ra CHO Jd BISON 
I € 8 8 9 r 0 T ҮЧ CHO Jd REN 
8 с с 01 с 8 8 9 9 © © ҮЧ THO so VISON 
8 [4 ա ç 9 9 01 8 Ӯ 0 © ҮЧ THO so BISON 
8 9 r r 8 01 01 9 9 © © ra THO со BISON 
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6 8 8 ғ Ӯ 0 © eu €HO նթ) BISON 

€ ç 9 9 ն 0 © еч €HO £O BISON 

8 8 L 9 ա 0 © е €HO £O VISON 

Ӯ 8 L 9 r 0 © е €HO £O PRISON 

9 6 8 01 9 9 © [4 ЄМ €HO TƏ VISON 

с с 8 8 9 r 0 © е €HO (49) BISON 

9 Y 6 8 9 Ӯ 0 © е €HO (69) VISON 

ç ç r 8 9 Ӯ © © е €HO լթ) BISON 

Ӯ 8 8 9 Ӯ 0 © е €HO 12 BISON 

© 9 8 9 9 © [4 е €HO 12 VISON 

L € r ғ 9 1 9 9 r 0 © TA €HO jd BISON 
9 $ 9 9 r © 8 9 r 0 © TA €HO jd vISON 
L Ӯ с Ӯ 9 с 8 9 Ӯ С © TA €HO jd BISON 
L L 9 r 6 8 8 9 9 © © TA €HO so vISON 
r © r ç 8 L 8 9 9 [4 z TA €HO 6) VISON 
9 ғ 9 € 9 L 01 9 9 © © TA €HO со BISON 
© r 9 ç 1 с 9 9 9 © © TA €HO նթ) BISON 
€ 9 Y Ӯ 9 9 Ӯ © © TA €HO то VISON 

I [4 ç L L 8 9 9 0 © TA €HO 149) vISON 

ç 1 9 1 8 9 9 [4 с TA £HO £O VISON 

ç 9 £ 9 8 9 9 © © 1 €HO £O BISON 

01 TI 01 01 8 9 9 с © TA €HO £O VISON 

ç с 9 L 9 r Ӯ 0 © TA €HO (69) BISON 

r r L 9 9 9 9 [4 [4 TA ЕН2 TƏ VISON 

© © 9 9 r [4 [4 0 [4 TA ЕН2 22 VISON 
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9 ա @ © ç 01 01 8 9 Ӯ @ ҮЧ €HO jd BISON 
1 ç 8 8 8 r r © ra €HO jd BISON 

9 r r r II 01 8 9 r T © ҮЧ £HO Jd BISON 
8 с 8 1 8 8 8 9 r [4 [4 уч €HO со VISON 
ç ç 9 9 8 8 8 9 Ӯ 0 © ҮЧ €HO so BISON 
© © © ç r 9 8 8 9 © © ҮЧ €HO so PRISON 
© ç © © 9 L 8 r tr 0 © vH €HO ՆԹ) BISON 
ե 01 8 9 ն ն 0 © vA €HO то тоқ 

© I © r 8 9 8 9 9 0 © ҮЧ €HO 149) BISON 
L L 9 Ӯ Ӯ 0 © ra €HO £O VISON 

01 01 01 9 9 ГА © РУ €HO £O BISON 

01 9 9 9 r ғ 0 © уч €HO £D BISON 

I I © L ç 9 С © 0 © vA €HO (49) VISON 
L L 9 9 © [4 © ҮЧ €HO (69) BISON 

9 9 с 9 ғ Ӯ 0 © ҮЧ €HO (69) BISON 

I 8 8 9 r 0 © ra €HO 12 BISON 

© © 1 6 L 8 9 r 0 © ra €HO IO BISON 
6 8 8 8 6 8 8 r r © © ra €HO լթ) BISON 
Y 9 © Y ç с 8 8 Ӯ © © е €HO jd BISON 
L L L ç Ӯ 8 8 8 Ӯ © © е €HO jd vISON 
01 9 © с ç ç 8 9 v T T е €HO jd BISON 
1 r ç 9 9 9 © © е €HO so BISON 

ç 9 8 9 r T fé ЕЯ €HO so BISON 

© ç 8 r r [4 © е €HO so BISON 

6 L 01 8 8 8 8 9 © 0 © е €HO 149) BISON 
8 L 8 9 9 © © е €HO то BISON 
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Chapter 6 (Я) 
Generalized Linear Mixed Models m 
for Proportions and Percentages 


6.1 Response Variables as Ratios and Percentages 


In this chapter, we will review generalized linear mixed models (GLMMs) whose 
response can be either a proportion or a percentage. For proportion and percentage 
data, we refer to data whose expected value is between 0 and 1 or between 0 and 100. 
For the remainder of this book, we will refer to this type of data only in terms of 
proportion, knowing that it is possible to change it to a percentage scale only when 
multiplying it by 100. Proportions can be classified into two types: discrete and 
continuous. Discrete proportions arise when the unit of observation consists of 
N distinct entities, of which individuals have the attribute of interest “у”. N must 
be a nonnegative integer and “у” must be a positive integer; here, y € N. Therefore, 
the observed proportion must be a discrete fraction, which can take values 
2, 7, ne x, A binomial distribution is the sum of a series of m independent binary 
trials (1.е., trials with only two possible outcomes: success or failure), where all trials 
have the same probability of success. For binary and binomial distributions, the 
target of inference is the value of the parameter such that 0 < E (3) = л < 1. Contin- 
uous proportions (ratios) arise when the researcher measures responses such as the 
fraction of the area of a leaf infested with a fungus, the proportion of damaged cloth 
in a square meter, the fraction of a contaminated area, and so on. As with the 
binomial parameter z, the continuous rates (fractions) take values between 0 and 
1, but, unlike the binomial, the continuous proportions do not result from a set of 
Bernoulli tests. Instead, the beta distribution is most often used when the response 
variable is in continuous proportions. In the following sections, we will first address 
issues in modeling when we have binary and binomial data. When the response 
variable is binomial, we have the option of using a linearization method (pseudo- 
likelihood (PL)) or the Laplace or quadrature integral approximation (Stroup 2012). 
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62 Analysis of Discrete Proportions: Binary 
and Binomial Responses 


A binomial distribution is the number of successes from a series of N independent 
binary trials — Bernoulli trials (1.е., trials with two possible outcomes: success or 
failure), where all trials have the same probability of success. In the context of a 
СІ.ММ, there are N binomial responses, each of which is the result of binary trials. 
The ith response consists of two pieces of information: the number of trials n; and the 
number of successes y;, as shown in the following example. 


6.2.1 Completely Randomized Design (CRD): Methylation 
Experiment 


An agent to induce demethylation is applied to plants; this agent converts methylated 
nucleotides to their unmethylated forms, thus causing epigenetic changes that 
produce or induce abnormal phenotypes such as deformation or stunting (Amoah 
et al. 2008). A pilot study was implemented to investigate the relationship between 
the dose of the demethylating agent and the observed proportion of plants with a 
normal phenotype. Seeds were treated with the demethylating agent at six different 
doses, including the control. Plants were sown in trays, with each tray containing 
seeds previously treated with the same dose of the demethylating agent. Each dose 
was replicated 4 times: 2 with 60 plants and 2 with 100 plants. The trays were 
allocated following a completely randomized design (CRD). The plants with a 
normal phenotype in each tray are shown (in Table 6.1) with the number of plants 
per tray (N). The notation 59(60) indicates that 59 normal plants were found out of 
60 plants under study. In the same way, the notation 14(100) indicates that 14 normal 
plants were found out of 100 plants under study. 

The sources of variation and degrees of freedom (DFs) for this experiment are 
shown in Table 6.2. 


Table 6.1 Number of normal 


Dose 
plants out of a total of N plants 0 0.01 01 0.5 10 15 
per tray and dose of the ——— rVIH — n 
demethylating agent 59(60) 58(60) 54(60) 4(60) 3(60) 3(60) 


58(60) 59(60) 53(60) 11(60) 2(60) 3(60) 
99(100) |98(100) 886100 |14(100) |2(100) | 1(100) 
98(100) |99(100) |87(100) |15(100) |1(100) |3(100) 


Table 6.2 Sources of 


Sources of variation Degrees of freedom 
variation and degrees of 

Dose t—-1=6-1=5 
freedom 

Error t(r—1)=6x 4-1)=18 


Total txr—1=6x4—-—1=23 
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0.8 


0.6 


0.4 


Observed proportion 


0.2 


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 
Dose 


Fig. 6.1 Effect of the demethylating agent on the proportion of normal plants 
The statistical model of a completely randomized design (CRD) is 
Yy 5H + ti + Ej 


where y;; is the number of observed normal plants in the tray j (j = 1,2,3,4) at the 
dose i (i = 1,2,:::,6), и is the overall mean, т; is the effect of dose i of the 
demethylating agent, and e;; are non-normal errors. 

The expected value (normal plants) of a set of tests n; follows a binomial 
distribution y; ~ Binomial(n; л), where z; is the probability of success in each 
trial, with 0 < z; < 1, where z; = Уу. Thus, the probability of observing an outcome 
y; can be written as 


P(Y; = yilni y:i) = ("= =m)" ym Ln 


У 


This probability depends оп the number of known tests n;, whereas the probabil- 
ity of success (z; is an unknown parameter. In Fig. 6.1, we observe that the 
probability of obtaining a normal plant depends on the applied dose of the 
demethylating agent. Given that y; has a binomial distribution, the expected value 
(the mean) is the product of the number of trials and the probability of success in 
each trial, that is, E(Y;) = njz;. Since the number of trials is fixed (once the data have 
been obtained), modeling the probability of success is equivalent to modeling the 
expected value as well as the variance since it is also a function of the number of 
trials and the probability of success. So, the expected value and variance of y; are 
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E(y;) = uj = тл; Var(y;) = тл(1— zi). 


This variance is small if the value z; is close to 0 or 1, and this increases to its 
maximum when z; = 0.5. This can be seen in Fig. 6.1, where proportions close 
to 0 or 1 show less variance than do proportions between 0.1 and 0.2 for a 
demethylating agent dose of 0.5. This variance can also be written in terms of the 
expected value as: 


Var(y;) =“ (ni — ш). 


1 


In this CRD, the fixed number of treatments z (doses) were randomly assigned to 
r experimental units (trays). The linear predictor describing the structure of the mean 
of this GLMM is 


i= n + ti 


where y; denotes the ith linear predictor, 7 is the intercept, and т; is the fixed effect 
due to treatments i (i = 1,2, :°°, 0) with 1 treatments and ғ; replicates in each 
treatment. 

The components that define this GLMM are shown below: 


Distribution: y;-Binomial(N;, z;) 
Linear predictor: и; = N + т; 


Link function: logit(z;) = logit( zi ) =n; 


1—л 


where q; is the linear predictor that relates the effect of dose i (i = 1,2,°°,6) to 
probability z;. The model uses the linear predictor (2];) to estimate the means (z; = и) 
of the observations for each treatment. 

The following GLIMMIX program fits a CRD with a binomial response: 


proc glimmix nobound method=Laplace; 
class Dose Rep; 

model y/N= dose/link=logit; 

lsmeans dose/lines ilink; 

run; 


In this example, the distribution of the dataset was not specified to GLIMMIX in 
the model specification because by using the expression "y/N, " proc GLIMMIX 
automatically infers that this dataset has a binomial distribution. It is also important 
to note that variable dose and repetition were declared as class variables in the 
"class" command, which Statistical Analysis Software (SAS) interprets as explana- 
tory variables that are nonnumerical factors. However, the variable declared “Кер” is 
not used in the model specification. 
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Table 6.3 Results of the analysis of variance 


(a) Fit statistics for conditional distribution 


—2 Log L (y |r. effects) 83.46 
Pearson's chi-square 11.95 
Pearson's chi-square/DF 0.50 

(b) Type III tests of fixed effects 

Effect Num DF Den DF F-value Рг> Е 
Dose 5 15 132.53 <0.0001 


(c) Dose least squares (LS) means 


Pr > И Mean Standard error mean 
<0.0001 |0.9813 | 0.007581 

<0.0001 | 0.9813 | 0.007581 

«0.0001 | 0.8813 0.01808 

«0.0001 | 0.1375 | 0.01925 

<0.0001 | 0.02501 | 0.008729 

<0.0001 | 0.03126 | 0.009728 


Dose | Estimate | Standard error 
0 3.9580 | 0.4122 
0.01 3.9580 | 0.4122 
0.1 2.0049 | 0.1728 
0.5 —1.8360 | 0.1623 
1 —3.6633 | 0.3580 
1.5 —3.4337 | 0.3212 


Part of the results is shown in Table 6.3. Pearson’s chi-squared statistic value 
divided by the degrees of freedom in part (a) (Pearson’s chi — square/DF = 0.5) 
indicates that there is no evidence of extra-dispersion in the dataset. The analysis of 
variance (ANOVA) tabulated in part (b) in Table 6.3, with the type III tests of fixed 
effects, indicates that there is a highly significant difference (P = 0.0001) in the 
average proportion of normal plants with respect to the dose applied to the seeds. 

The output when using the “Ismeans” command in conjunction with the “Шок” 
option is in the “Mean” column (part (c) in Table 6.3). These values are the values of 
Z; $, i.e., the estimated probabilities ло = 0.9813 and лоо = 0.9813 of normal plants 
for the treatments whose doses are 0 and 0.01, respectively. For treatments with 
doses of 0.1 and 0.5, the observed probabilities of normal plants are 701 = 0.8813 
and 705 = 0.1375, respectively, whereas for the 1 and 1.5 doses, the observed 
probabilities of normal plants decrease dramatically with Պլ--0.02501 and 
21.5 = 0.03126, respectively. 

Figure 6.2 shows the mean comparisons (least significance difference (LSD)) of 
the estimated probabilities according to the dose applied to the seeds in trays. In this 
figure, we can observe that in the treatments with dose = 0 (control) and dose = 0.01, 
the observed proportions of normal plants are not statistically different from each 
other, but they do differ with the other applied doses. At a dose of 0.1, the observed 
proportion of normal plants was 88.13%, and this was statistically different from all 
the doses used. Finally, doses at 0.5, 1, and 1.5 of the demethylating agent in the 
observed proportion of normal plants decreased drastically to 13.75%, 2.501%, and 
3.12%, respectively. The doses of 1 and 1.5 produced statistically equal proportions 
of normal plants. 
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Fig. 6.2 Comparison of the estimated probabilities per dose of the demethylating agent 


If the researcher wishes to model how dose levels of the demethylating agent 
affect normal plant proportions, then the dose must be declared as a continuous 
variable. The following SAS syntax with proc GLIMMIX runs a binomial 
regression: 


proc glimmix data—crd bin method=Laplace plots—all; 
class Rep; 

model y/N= dose/solution; 

random rep; 

run;quit. 


Most of the commands and options have already been discussed throughout this 
book; the “model y/N” command indicates that the response variable is in a ratio. 
Therefore, this dataset is modeled with a binomial distribution, which is affected by 
the different number of individuals in each repetition. proc GLIMMIX interprets the 
distribution of the data as binomial, whereas the “solution” option requests the 
parameter estimates of the model (intercept and slope). 

The components that define this GLMM are shown below: 


Distribution: y;-Binomial(N;, z;) 
Linear predictor: g; = у + f ж dose; 
Link function: logit(z;) = logit( zi ) =i 


1-л; 


Thus, the model сап be written as 
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Table 6.4 Regression analysis results 


(a) Fit statistics 


—2 Log likelihood 231.58 
Akaike information criterion (AIC) (smaller is better) 235.58 
AICC (smaller is better) 236.15 
Bayesian information criterion (BIC) (smaller is better) 237.93 
САІС (smaller is better) 239.93 
HQIC (smaller is better) 236.20 
Pearson’s chi-square 2317.12 
Pearson’s chi-square/DF 96.55 
(b) Type Ш tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Dose 1 19 475.97 <0.0001 
(c) Solutions for fixed effects 

Effect Estimate Standard error DF t-value Pr > й 
Intercept 2.7927 0.1302 3 21.46 0.0002 
Dose —7.6232 0.3494 19 —21.82 <0.0001 


_ Hi ա ուտ. e Aj ա . у А 
п; = log (аа =) =log 5 um) = log h 2 = logit(z;) = у + fdose; 


and the logit function can be written in terms of the probability of success, z;, as 


1 
T+ exp(— n) 

Part of the SAS output of the GLIMMIX syntax is shown below. The goodness- 
of-fit statistics, type Ш tests of fixed effects, and parameter estimates are shown in 
Table 6.4. The analysis of variance indicates that the demethylating agent has a 
highly significant effect on the observed proportion of normal plants (P < 0.0001) 
(part (b)). The maximum likelihood estimates for the intercept and slope are 
ղ = 2.7927 and В = — 7.6232, respectively. 

Figure 6.3 shows that as the value of the linear predictor increases (7;), the value 
of the residuals rapidly decreases. We can also see that the residuals plotted against 
the quantiles clearly do not follow a normal distribution because this model is not a 
linear function of the explanatory variable “dose.” 

Figure 6.4 shows that the proportions studied and fitted are not so far apart, and, 
as such, the binomial model is suitable for this dataset. The estimated linear predictor 
of this model is as follows: 


Я, =й + Вх dose; = 2.7927 — 7.6232 x dose;. 
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Fig. 6.3 А graph of residuals versus the linear predictor, quantiles 
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Fig. 6.4 Observed and estimated proportion 


The logit of the probability of success is a linear function of the explanatory 


variables, so the model can be written in terms of the probability of success 
(observing normal plants) as 
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1 
= 
! 1-- ехр ան) 

Given the parameter estimates, we can predict the success probability of observ- 
ing anormal plant, and given a certain concentration of the demethylating agent, this 
estimated probability (using the estimated linear predictor) can be seen plotted in 
Fig. 6.4. 


n 1 1 
Ti լբ expt) 1 ехр(-27927+7.6232 x dose) 


6.3 Factorial Design in a Randomized Complete Block 
Design (RCBD) with Binomial Data: Toxic Effect 
of Different Treatments on Two Species of Fleas 


A group of researchers wishes to study the toxic effect of certain treatments (Trts) on 
two flea species (SP) (Daphnia magna and Ceriodaphnia dubia). To compare the 
toxicity effect of treatments on both flea species, a randomized complete block 
design (RCBD bioassay) was implemented with three replicates per treatment, 
with each replicate consisting of 10 fleas (Appendix: Fleas). The linear predictor 
describing this experiment is described below: 


бум =n + a + Bj + (ap); + bioassay, + rep(bioassay) д) 


where q is the intercept, а; is the fixed effect due to species i, р; is the fixed effect 
of treatment j, (а); is the fixed effects interaction between the flea species and 
treatment, bioassay, is the random effect due to bioassay k assuming 


bioassay, ~ N (0. abus) , and гер(оаѕѕау) is the random effect due to repeti- 
. . . . 2 
tion bioassay assuming rep(bioassay) ատ (o. e աը 
The remaining components of this GLMM with a binomial response (АУ, л) are 
described below: 


Distribution: y;;u | bioassay;, rep(bioassay),.~Binomial(N jx, лу) 


bioassay, ~ N (0, | ‚ rep(bioassay) (к № (0. ЕТ , Where № is 


the number of dead fleas, observed in species i in replicate / in bioassay k under 
treatment /, 


Tijk 


Link function: logit (тж) = log ի z] = у 


The following SAS syntax allows us to fit the GLMM with a binomial response. 
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Table 6.5 Results of the (a) Fit statistics 
analysis of variance 


—2 Log likelihood 145.33 
AIC (smaller is better) 173.33 
AICC (smaller is better) 177.85 
BIC (smaller is better) 160.71 
CAIC (smaller is better) 174.71 
HQIC (smaller is better) 147.97 
(b) Fit statistics for conditional distribution 

—2 Log L (Sobrevi Іт. effects) 145.33 
Pearson’s chi-square 10.72 
Pearson’s chi-square/DF 0.10 
(c) Covariance parameter estimates 

Cov Parm Estimate Standard error 
Bioen —0.1051 

Bioen*SP (Rep) —0.1192 

(d) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
SP 1 14 0.02 0.8829 
Trt 5 80 15.08 <0.0001 
SP*trt 5 80 4.66 0.0009 


proc glimmix data=pulgas nobound method=laplace; 
class Bioen SP Trt Бер; 

Model Sobrevi/n= SP | Trat/dist=binomial; 

random Bioen sp*bioen (тер); 

lsmeans SP|Trt/lines ilink; 

run; 


Part of the results is listed in Table 6.5. The fit statistics in part (a) and the 
conditional statistics in part (b) are useful for model comparison, whereas the 
variance component estimates are shown in part (c). The value of the statistic 
Pearson s chi — square/DF = 0.10 indicates that the binomial model gives a good 
fit to the dataset. The variance component estimates for bioassays and replication 
nested in bioassays are т = — 0.1051 апа а айбыны = — 0.1192, теврес- 
tively. Тһе type III tests of fixed effects (part (d)) show the significance tests of the 
fixed effects in the model. The treatment effect and the interaction between the flea 
species (SP) and treatment are clearly significant with P < 0.0001 and P = 0.0009, 
respectively. 

Since survival was statistically similar in both flea species, we will focus on the 
factors that were significant. Part (a) in Table 6.6 shows the means and standard 
errors of treatments on the model scale (“Estimate” column) and on the data scale 
(“Меап” column), obtained with “Ismeans” and Ше “ilink” option as well as the 
mean comparisons, which are on the model scale (part (b)). 
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Table 6.6 Means and standard errors on the model scale and on the data scale 


(a) Trt least squares means 


Trt | Estimate | Standard error | DF |1-уаше |Pr > Id Mean Standard error mean 
ТІ 8.1179 |4.3180 80 1.88 0.0637 | 0.9997 0.001287 

T2 4.3564 | 3.0554 80 1.43 0.1578 | 0.9873 0.03820 

T3 1.0081 |0.1924 80 5.24 | «0.0001 | 0.7326 0.03768 

T4 |-1.0509 | 0.1712 80 |-614 | <0.0001 | 0.2591 0.03286 

T5 | —4.7187 | 3.0570 80 |-1.54 0.1266 | 0.008848 |0.02681 

T6 | —8.1182 | 4.3184 80 | —1.88 0.0638 | 0.000298 | 0.001286 


(b) Conservative T grouping of Trt least squares means (a=0.05) 
LS means with the same letter are not significantly different 


Trt Estimate 

TI 8.1179 A 

T2 4.3564 B A 

T3 1.0081 B A C 
T4 — 1.0509 B D C 
T5 —4.7187 D C 
T6 —8.1182 D 


The LINES display does not reflect all significant comparisons. The following additional pairs are 
significantly different: (T3,T4) 


Based on the fixed effects tests, the flea species x treatment interaction is 
significant. The means on the model scale are listed under the “Estimate” column, 
followed by their standard errors, "Standard error" (Table 6.7). The output of the 
“ilink” option in “Ismeans” applies the inverse function of the link function to the 
estimates on the model scale to obtain the estimates on the data scale. The proba- 
bilities, on the data scale, are given under the *Mean" column with their respective 
standard errors and correspond to the probability of insect (flea) survival. 

Figure 6.5 shows that the survival of both species is different in treatments 2—5; 
the Daphnia species showed more resistance in treatments 2 and 3, whereas the 
Ceriodaphnia species showed greater resistance in treatments 4 and 5. On the other 
hand, in treatments 1 and 6, survival was similar in both species. 


6.4 А Split-Plot Design in an RCBD with a Normal 
Response 


A split plot is the most common treatment structure design in agricultural and agro- 
industrial research areas. These experiments generally involve two or more factors 
under study. Typically, large or primary experimental units, commonly known as the 
whole plot, are grouped into blocks. The levels of the first factor are randomly 
assigned to the whole plots. Then, each whole plot is divided into smaller units, 
known as split or secondary plots. The levels of the second factor are randomly 
assigned to the subplots within each whole plot. 
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Fig. 6.5 The average survival rate of both species 


The model equation for the analysis of variance assuming normality in the 
response is 


Ук =N + ai + rk + (та) + B; + (aD); + eijk 


i-1,2,,aj-1,2, s, b;k-12, sr 


where у, is the observed response variable in the kth block at the ith level of factor A 
and at the jth level of factor B, a and f refer to the fixed treatment effects due to 
factors A and B, respectively, ғ is the random effect due to the blocks, (ға); is the 
random error term due to the whole plot that is an interaction between the blocks and 
factor A, and ej; is the random residual effect. Normally, the errors and other random 
terms are also assumed to be normal; however, when the response variable is not 
normally distributed, this way of specifying the model is not the most appropriate. 
Thus, under the assumption that the response variable is normal, this way of 
specifying the model is valid. 


6.4.1 An RCBD Split Plot with Binomial Data: Carrot Fly 
Larval Infestation of Carrots 


Data were obtained from an experiment that was designed to compare a number of 
carrot genotypes with respect to their resistance to infestation by carrot fly larvae. 
The data involved 16 genotypes that were compared at 2 pest levels to be controlled. 
The experiment was conducted in three randomized blocks. Each block consisted of 
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Table 6.8 The notation 44/53 denotes that 44 carrots were infected (у) out of a sample size of 
53 studied (N) 


Treatment (level of infestation) 

1 2 
Genotype Block1 Block2 Block3 Block1 Block2 Block3 
Gl 44/53 42/48 27/51 16/60 9/52 26/54 
G2 24/48 35/42 45/52 13/44 20/48 16/53 
ОЗ 8/49 16/49 16/50 4/52 6/51 12/43 
G4 4/51 5/42 12/46 15/52 10/56 6/48 
G5 11/52 13/51 15/44 4/51 6/43 9/46 
G6 15/50 5/49 7/50 1/51 8/49 3/54 
G7 18/52 13/47 7/47 2/52. 4/52 6/52 
G8 5/47 15/49 8/50 6/56 4/50 6/42 
G9 11/52 6/45 5/51 3/54 8/51 3/53 
С10 0/51 10/39 14/48 3/50 0/50 10/51 
G11 6/52 4/46 10/37 1/52 7/38 4/48 
G12 0/52 4/55 1/40 1/50 3/50 1/45 
G13 14/45 18/43 4/40 4/51 7/46 7/45 
G14 3/52 12/53 4/55 3/52. 7/48 12/49 
G15 11/52 6/54 5/49 2/50 4/46 14/53 
G16 4/53 1/40 4/52 4/56 1/44 3/42 


Table 6.9 Sources of variation апа degrees of freedom 


Sources of variation Degrees of freedom 

Blocks r—1=3-1=2 

Factor A (infestation) a—-1l=2-1=1 

Error, (A*blocks) (r—1)\(a—1)=2 

Factor B (genotypes) Ե-1-16-1Հ15 

Infestation* genotype (A*B) (a — Db — 1) = 15 

Error, а(г = lb — 1) = 2 x 2 15 = 60 
Total rxaxb—1=3x2x16—1=95 


32 plots, 1 for each combination of genotype and pest infestation level. At the end of 
the experiment, about 50 carrots were taken from each plot and assessed for 
infestation by carrot fly larvae. The data obtained are shown in Table 6.8. 

Table 6.9 shows the analysis of variance summarizing the sources of variation 
and degrees of freedom. 

Rewriting in terms of the linear predictor 


Nie =N + О + rk + (та) + f; 4 (ap); 


Since the observations were taken at the subplot level, conditioned on the 
structural effects of the design, these observations have a variance associated with 
the subplot. Therefore, a and д refer to the treatment fixed effects due to factors А 
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Table 6.10 Results of the 


. қ (a) Fit statistics for conditional distribution 
analysis of variance 


—2 Log L (у I r. effects) 527.82 
Pearson’s chi-square 189.09 
Pearson’s chi-square/DF 1.97 

(b) Covariance parameter estimates 

Cov Parm Subject Estimate Standard error 
Intercept Bloque 0.004272 0.02741 

Trt Bloque 0.03344 0.03545 

(с) Type III tests of fixed effects 

Effect Num DF |DenDF | F-value | Pr>F 
Genotype 15 60 28.28 <0.0001 
Trt 1 2 16.24 0.0564 


Genotype*Trt |15 60 4.45 <0.0001 


апа B, respectively; (af), refers to the interaction of the above factors; r, is the 
random effect due to blocks; and blocks x whole plot (ға); is assumed to contribute 
to the variation such that rg ~ N (0, 62) апа (та), ~ N (0, об, кд). This model uses 
the linear predictor Их to estimate the mean of the observations и. 

The specification of the this GLMM is as follows: 


Distribution: ук | ғ, (ra),4~Binomial(N jx, յթ) 
Ге ~ N (0, o2) ի 


(ra), a N(0, іл сн! 
Link function: Јов (лу) = Пух. 


The following SAS GLIMMIX program allows the fitting of a GLMM with a 
split-plot structure in a randomized complete block design with a binomial response. 


proc glimmix data=spd_pp nobound method=quadrature; 
class Genotype Trt Block ; 

model y/N = Genotype | Trt; 

random intercept trt /subject=block; 

lsmeans Genotype | Trt/lines ilink; 

run; 


The program uses the quadrature estimation method (nmethod=quadrature). 
This estimation method produces similar results as the Laplace method. Part of the 
results is provided in Table 6.10. Pearson’s chi-squared/DF value in part (a) gives an 
idea of whether there is overdispersion or extra-variation in the dataset. In this case, 
Pearson s chi — square/DF = 1.97 indicates that there is overdispersion in the 
dataset, so it is feasible to use either the pseudo-likelihood (PL) estimation method 
or a different distribution. In addition to these results, the variance component 
estimated due to blocks and blocks x genotype (the whole plot) in part (b) are 
0 к = 0.004272 and б есек a= 0.03344, respectively. The results of the fixed 
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effects tests (part (c)) indicate that the effect of genotype and the interaction between 
genotype and treatment are significant. 

The appropriate method for model evaluation depends on whether or not there is 
evidence of overdispersion, so we consider this issue below. The residual variance 
incorporates systematic discrepancies between the model and the observed 
responses, variation between replicates (observations in independent experimental 
units with the same values of the explanatory variables) and sampling variation 
arising from the distribution of the data; in this case, it is the binomial distribution. If 
there are no duplicate observations and the fitted model provides an adequate 
description of the systematic trend, then only sampling variation contributes to the 
residual variance. If this is true, then the residual deviation has an approximate 
chi-squared distribution with degrees of freedom similar to the mean squared error 
(MSE) (the residual). 

Since there is overdispersion in the data using the binomial distribution, there are 
three alternatives we can explore: (1) review the linear predictor, which involves 
carefully revising the analysis of variance table; (2) add a scale parameter; or (3) use 
another distribution for the dataset. Each of these three possible alternatives is 
discussed below, in this order. 


6.4.1.1 Linear Predictor Review (q;;x) 


If the proportion of normal plants (z;; is being affected by the genotype within each 
infestation level (trt = а) from plot to plot within each of the blocks, then a nested 
factorial effect of genotype within infestation levels (trt) could be included in the 
analysis of variance. Thus, the linear predictor would be defined as 


йук =N + Qi + rk + (та) + В(а) xi 


where 0,, В(т) г гь and (ra), are the fixed effects due to treatments, the effect of 

genotypes nested within a treatment, random effects due to blocks (ғ CN (0, o?)), 

and the interaction between blocks and treatment ((ra) iN (0, OR a) ) ‚ respectively. 
The following GLIMMIX syntax estimates the above linear predictor: 


proc glimmix data—spd pp method=laplace; 

class Genotype Trt Block ; 

model y/N= Trt genotype (trt); 

random trt/subject-block; 

lsmeans genotype (trt)/lines ilink slice=trt slicediff=trt; 
run; 


The only difference between this proc GLIMMIX and the previous one is that in 
this program, we have included the nested effect of genotypes within treatment, 
genotype (trt), and removed only the fixed effects of genotypes. Part of the results is 
shown in Table 6.11. The value of Pearson's chi-squared/DF statistic (part (a)) as 


6.4 А Split-Plot Design in an RCBD with a Normal Response 225 


Table 6.11 Results of the (a) Fit statistics for conditional distribution 


.. шей UT ет ү La. effects) 527.82 
Pearson’s chi-square 189.07 
Pearson’s chi-square/DF 1.97 
(b) Covariance parameter estimates 
Cov Parm Subject Estimate Standard error 
Intercept Bloque 0.004265 0.02740 
Trt Bloque 0.03343 0.03544 
(c) Type III tests of fixed effects 
Effect Num DF |DenDF |Е-уаше |Pr >F 
Trt 1 2 16.20 0.0565 
Genotype (Trt) |30 60 15.83 <0.0001 


well as the fit statistics did not decrease when modifying the linear predictor. 
However, the F-values calculated for treatments and genotypes within treatments 
(part (c)) are smaller than those obtained in the split-plot design. 

Since the overdispersion is still present (Pearson s chi — square/DF = 1.97), 
another alternative is to add a scaling parameter to the model. This alternative is 
presented below. 


6.4.1.2 Scale Parameter 


If the residual deviation is larger than expected when compared to critical values of 
the appropriate chi-squared distribution, and if this cannot be corrected by redefining 
the linear predictor of the model, then there is more variation present than can be 
accounted for by the distributional likelihood assumption. In this case, we say that 
the data show overdispersion. The simplest way to deal with overdispersion is to 
extend the model for scaling the variance function. Adding the scale parameter 
replaces Var(y,) = ՊԱ — лу) with Var(y;) = фл — лу). The rationale for this 
approach is discussed by Collett (2002). The parameter ф is a scale factor, called the 
dispersion parameter, which is used to summarize the degree of overdispersion 
present in the observations. Clearly, ó = 1 corresponds to the original distribution 
model. This parameter can be estimated in several different ways. The logarithm of 
the likelihood of the binomial distribution is given by 


N Tij 
"E Ն. E ( = Tij 


In the logarithm of the likelihood, the term “y;; log ( 


) + юз я) 


Tij 
1-аҙ 


| is very important; апу 


quantity that multiplies y;; is known as the natural or canonical parameter, and this 
parameter is always a function of the mean. For the binomial distribution, the mean 
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եղ 
1-лу 


Мул and the natural parameter is log( ) , and, in categorical data, it is known as 


“log odds.” The generalized estimating equation (GEE) method provides a valid 
analysis for marginal means, since under a binomial distribution, in the quasi- 
likelihood, the variance of the distribution is given by $z;(1 — տչ). This is achieved 
by adding the "random residual " command in the following SAS syntax. 

The following GLIMMIX commands are used to invoke the scale parameter but 
using the first predictor proposed for these data. 


procglimmix data—spd pp nobound; 
class GenotypeTrtBlock ; 

model y/N — Trt |genotype; 

random intercept trt/subject=block; 
random residual ; 

lsmeans Trt |genotype/lines ilink ; 
run; 


In this syntax, we still keep the binomial distribution (y/N is equivalent to telling 
GLIMMIX in SAS that it is a binomial response) but will add the “random 
residual " command. In this case, we cannot obtain the maximum likelihood 
estimators because we cannot implement the Laplace method (“method = laplace") 
or adaptive quadrature (“method = quad”) approximation method, so the estimation 
is performed through the pseudo-likelihood (PL) method. This causes the scale 
parameter to be estimated, and, consequently, it is used in the adjustment of all 
standard errors and statistical tests. Proc GLIMMIX uses the generalized statistics of 
McCullagh and Nelder (1989), i.e., y’/df as the estimator of the scale parameter (9). 
АП standard errors from Ше analysis under а binomial distribution аге multiplied by 


Và. and all F-tests are divided by $ to account for overdispersion. Part of the output 


is shown below. 

The value of Pearson’s statistic in part (a) indicates that overdispersion has not 
been eliminated. Chi — square/DF = 3.13, on the contrary, indicates that this value 
has increased. This result indicates that adding a scale parameter to the model does 
not decrease the extra-variation present in the dataset, since the binomial assumption 
forces a relationship between the mean and variance of the data that might not 
contain the data being analyzed. On the other hand, the estimated scale parameter is 
$ = 3.1263 (part (b)). Pearson’s residual analysis showed that its variance is 3.6257, 
which is considerably larger than 1, implying a large overdispersion. In addition, the 
results of the fixed effects tests (part (c)) vary from those above (Table 6.12). 

Therefore, the third option based on assuming an alternative distribution (beta 
distribution) on the response variable is discussed below. 
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Table 6.12 Results of the (a) Fit statistics 
analysis of variance, adding a 


scale parameter to the model —2 Res log pseudo-likelihood 182.52 
Generalized chi-square 200.09 
Gener. chi-square/DF 3.13 
(b) Covariance parameter estimates 

Standard 
Cov Parm Subject | Estimate | error 
Intercept Bloque | 0.005416 | 0.04750 
Trt Bloque | 0.03202 | 0.06338 
Residual variance compo- 3.1263 0.5719 
nent (VC) 
(c) Type III tests of fixed effects 
Effect F-value |Pr >F 
Trt 10.20 0.0856 
Genotype 9.04 <0.0001 
Genotype*Trt 1.42 0.1674 


6.4.1.3 Alternative Distribution 


Another approach to control the overdispersion would be to use a different distri- 

bution in the interval (0, 1], such as the beta distribution, to model the data. 

Generally, this distribution yields good results when all experiments have the 

same number of observations (successes and failures), i.e., when М = N. When 

Му varies a little, even in many cases, the beta distribution yields acceptable results. 

It is important to mention that the proportions come from binomial counts, and, 

therefore, we now define the response variable as р = NE so that it can be modeled 
ij 

as the beta distribution. The components of the beta response model are listed below: 

Distribution: рук | The (ra),v=Beta(z;;,, Փ) with ó as the scale parameter 

2 2 
тк  N (0,62), (га), ~ М(0,62,) 
Linear predictor: nj. = N + а; + rk + (ar) + B; + (a) 


Link function: logit (aj) = logit( Tik ) = 


l — Tijk 


As mentioned before, we now use the response variable pj; = i This new 

ij 
response variable p; is not the same as the one used in the binomial distribution. The 
following SAS commands fit a GLMM in a split-plot randomized complete block 
design with a beta response. It is important to mention that before implementing this 


model in SAS GLIMMIX, the variable р=р = NE was defined. 


proc glimmix data—spd pp nobound method=laplace; 
class GenotypeTrtBlock ; 

model p = Genotype|Trt/dist=beta; 

random intercept trt/subject=block; 

lsmeans Genotype|Trt/lines ilink; 

run; 
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Table 6.13 Fit Statistics (a) Fit statistics 

.. and bota Distribution Binomial Beta 
—2 Log likelihood 541.85 —246.49 
AIC (smaller is better) 609.85 —176.49 
AICC (smaller is better) 648.87 —132.28 
BIC (smaller is better) 579.20 —208.04 
CAIC (smaller is better) 613.20 —173.04 
HQIC (smaller is better) 548.24 —239.91 
(b) Fit statistics for conditional distribution 
Distribution Binomial Beta 
—2 Log L (y I r. effects) 527.82 —254.68 
Реагвоп 5 chi-square 189.09 93.95 
Pearson’s chi-square/DF 1.97 1.01 


Table 6.14 Results of the analysis of variance, assuming binomial and beta distributions 


(a) Covariance parameter estimates 


Binomial Beta 
Cov Parm Subject Estimate Standard error Estimate Standard error 
Intercept 0.004272 0.02741 —0.00524 
Trt 0.03344 0.03545 0.02175 0.1475 
Scale (ф) . 25.7070 
(b) Type III tests of fixed effects 

Binomial Beta 

Effect Num DF Den DF F-value Pr» F F-value Pr > F 
Trt 1 4 16.24 0.0564 9.98 0.0342 
Genotype 15 60 28.28 <0.0001 13.25 <0.0001 
Genotype*Trt 15 60 4.45 <0.0001 2.23 0.0146 


Some of the SAS GLIMMIX output is listed below. Based on the fit statistics 
under the binomial (first alternative) and beta distributions (Table 6.13), clearly the 
values of the statistics related to the degree of overdispersion are lower in the beta 
distribution than in the binomial distribution, indicating that the beta distribution 
provides a better fit (part (a)). Looking at the fit statistics for the conditional model in 
part (b), the values of the three fit statistics in the binomial model are higher than the 
values in the beta model. The value of Pearson s chi — square/DF under the beta 
distribution is 1.01. This value indicates that the overdispersion has been virtually 
eliminated from the data and that therefore the beta distribution is a better candidate 
model for this dataset. 

Adding the scale parameter (ф) to the model, the variance components and 
standard errors in Table 6.14 cause (part (a)) variation for each of the results and, 
therefore, the F- and t-tests are affected (part (b)). The estimated value of the scale 
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Table 6.15 Estimated means and standard errors on the model scale and the data scale 


(a) Trt least squares means 


Trt |Estimate | Standard error | DF | t-value Pr > || Mean | Standard error mean 
Trtl | —1.2362 | 0.01768 2 —69.94 0.0002 | 0.2251 | 0.003083 
Trt2 | —1.9327 | 0.01768 2 —109.34 | <0.0001 | 0.1264 | 0.001952 
(b) Genotype least squares means 

Standard Standard error 
Genotype | Estimate | error DF value |Pr> И | Mean mean 
Gl 0.1524 | 0 57 |Infty «0.0001 |0.5380 0 
G10 —1.4143 |0 57 |-—Infty | <0.0001 |0.1956 |O 
G11 - 1.8698 |0 57 | —Infty |<0.0001 01336 |O 
G12 —2.8971 | 0.03885 57 | —74.58 | <0.0001 | 0.05230 | 0.001925 
G13 —1.4336 |0 57 |-—Infty | <0.0001 01925 |O 
G14 —1.8761 |0.1304 57 |-14.39 |<0.0001 |0.1328 |0.01502 
G15 —1.8618 |0 57 |—Шшйу |<0.0001 01345 |O 
G16 —2.6686 |0 57 | —Infty | <0.0001 | 0.06485 | 0 
G2 0.2225 | 0 57 |Infty «0.0001 | 0.5554 |0 
ОЗ —1.3329 |0 57 |-—Infty | <0.0001 0.2087 |0 
G4 —1.5897 |0 57 |—Шшйу |<0.0001 0.1694 |O 
G5 —1.3696 |0 57 |-—Infty |«0.0001 |0.2027 |0 
G6 —2.0173 |0 57 |-—Infty |«0.0001 0.1174 |O 
G7 —1.7001 |0.1356 57 |—12.53 |«0.0001 |0.1545 |0.01771 
G8 —1.7161 |0 57 |-—Infty |<0.0001 |0.1524 |O 
G9 —1.9796 |0 57 |—Infty |<0.0001 01214 |O 


parameter is $ =25.7018. The variance components based on the binomial model 
and beta are listed below. 

The treatment means (part (a)) and genotypes (part (b)) are presented in 
Table 6.15. The estimates on the model scale are listed under the column “Estimate” 
with their respective standard errors “Standard error,” and the values on the data 
scale are listed under the column “MEAN” with their respective standard errors 
“Standard error mean.” In the table of least squares means for the effect of geno- 
types, inconsistencies are observed in the values of t and in the standard error values 
of the means, so other estimation alternatives should be sought. 

In large samples, both binomial and normal distributions are quite similar. 
Logically, the latter two analyses, binomial and beta, are attractive because of their 
consistency with the nature of the data. Because of the inconsistencies in the 
estimates of the mean for genotypes (tvalue = Infty and standard error of the 
mean), a robust method of estimation could be used; in this case, this is the normal 
distribution. 

Assuming that р has a normal distribution with a mean их and constant 
variance 62, the components of this model are as follows: 


230 6 Generalized Linear Mixed Models for Proportions and Percentages 


Table 6.16 Results of the (a) Fit statistics 

2-7 -2 Res log likelihood —79.38 
AIC (smaller is better) —73.38 
AICC (smaller is better) —72.98 
BIC (smaller is better) —76.08 
CAIC (smaller is better) —73.08 
HQIC (smaller is better) —78.81 
Generalized chi-square 0.60 
Gener. chi-square/DF 0.01 


(b) Covariance parameter estimates 


Cov Parm Standard error 


Bloque 0.000123 0.000742 
Trt*bloque 0.000329 0.000925 


Residual 0.009442 0.001724 


(с) Type Ш tests of fixed effects 


Effect F-value Pr> F 


Genotype <0.0001 
Trt 0.0456 
Genotype*Trt 0.0016 


Distribution: pct;;x | ль (ra), - Normal(uiji, o?) 

ry ~ N (0,62), (га), ~ М(0,6%,) 

Linear predictor: 7; = N + а; + ry + (ar) + В; + (аў) 
Link function: q; = Ир; identity 


Similarly, in this example, the response variable used was pct; = м. This new 
mm 


response variable pct;;, is not the same as the response variable used in the binomial 
distribution. The following SAS GLIMMIX commands adjust a linear mixed model 
(LMM) under a split plot in a randomized complete block design with a normal 
response. 


procglimmix data—spd pct nobound; 
class Genotype Trt Block ; 

model pct = Genotype | Trt; 

random block block*trt; 

lsmeans Genotype | Trt/lines; 

run; 


Part of the results is shown below. The values of fit statistics in part (a) of 
Table 6.16 for the model are clearly lower than those estimated in the previous 
options. This indicates that the normal distribution is reasonable, even though the 
response is a proportion. The estimated variance components, tabulated in 
part (b) due to blocks, blocks x treatment, and the mean squared error (MSE) 
(Residual = Gener. chi-square/DF) are бы = = 0.000123, = = 0.00039, апа 
62 =MSE = 0.009442 == 0.01, respectively. 
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Table 6.17 Means and standard errors for genotypes and treatments 


(a) Genotype least squares means 


Standard t- Standard error 

Genotype | Estimate | error DF | value |Рг>Ш |Mean | mean 

Gl 0.5260 | 0.04086 60 | 12.87 | <0.0001 | 0.5260 | 0.04086 
G10 0.1340 | 0.04086 60 3.28 0.0017 | 0.1340 | 0.04086 
G11 0.1522 | 0.04086 60 3.73 0.0004 | 0.1522 | 0.04086 
G12 0.03332 | 0.04086 60 0.82 0.4179 | 0.0333 | 0.04086 
G13 0.2026 | 0.04086 60 4.96 | <0.0001 | 0.2026 | 0.04086 
G14 0.1342 | 0.04086 60 3.28 0.0017 | 0.1342 | 0.04086 
G15 0.1360 | 0.04086 60 3.33 0.0015 |0.1360 | 0.04086 
G16 0.05625 | 0.04086 60 1.38 0.1737 | 0.0562 | 0.04086 
G2 0.5355 | 0.04086 60 | 13.11 |<0.0001 |0.5355 | 0.04086 
Օ3 0.2139 |0.04086 60 5.24 | <0.0001 |0.2139 | 0.04086 
G4 0.1751 0.04086 60 4.28 | «0.0001 | 0.1751 | 0.04086 
G5 0.2035 | 0.04086 60 498 |<0.0001 | 0.2035 | 0.04086 
G6 0.1301 0.04086 60 3.18 0.0023 | 0.1301 | 0.04086 
G7 0.1671 0.04086 60 4.09 0.0001 | 0.1671 | 0.04086 
G8 0.1504 | 0.04086 60 3.68 0.0005 | 0.1504 | 0.04086 
G9 0.1187 | 0.04086 60 2.90 0.0051 | 0.1187 | 0.04086 


(b) Trt least squares means 


Trt Estimate | Standard error |DF |1-уаше |Рг> Ш | Mean Standard error mean 


Trtl | 0.2478 0.01863 2 13.30 0.0056 | 0.2478 | 0.01863 
Trt2 0.1358 0.01863 2 7.29 0.0183 0.1358 | 0.01863 


The F-statistics for the fixed effects of genotype, treatments, and the interaction 
between both factors provide significant statistical evidence on the proportion of 
infested carrots in each of the genotypes (part (c)). Overall, the least squares means 
for genotypes and treatments are reported in Table 6.17 in parts (a) and (b). The 
genotypes showing the highest fraction of infested carrots were 1, 2, 3, 5, and 
13, whereas genotypes 12 and 16 showed the lowest percentage of infested carrots. 
Now, for treatments, the highest proportion of infested carrots was observed in 
treatment 1 with 24.78%, whereas in treatment 2, it was 13.58%. 

Based on the fixed effects tests, the interaction effect of genotype x treatment on 
the proportion of infested carrots was statistically different. Genotypes 9 and 
16 showed higher susceptibility in treatment 1 followed by treatment 2, whereas 
genotypes 5, 11, 13, and 15 showed the same proportions of infested carrots in both 
treatments (Fig. 6.6). On the other hand, genotypes that showed higher resistance to 
infestation levels were genotypes 1, 2, and 6 followed by genotypes 3, 4, 7, 8, 
10, and 12. 
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Fig. 6.6 The average proportion of infested carrots in genotypes as a function of treatment 


6.5 A Split-Split Plot in an RCBD:- In Vitro Germination 
of Seeds 


The growth of a plant in a tissue culture can be explained by various combined 
effects of A, B, and C factors. For this, the availability and efficient use of chemical 
resources (factors) is of great relevance when availability is scarce or too expensive. 
In light of this, the combination of three reagents (A, B, and C), reagent A at three 
levels and reagents B and C at two levels, were tested on the in vitro germination of 
orchid seeds. The combination of the levels of each of the factors is schematized 
below. 


Block 1 
A3 են А» 
Bi B» Bi B; B2 Bi 


Cy Cy С С С С С С С С С С 
С, С, С, С, C, Ci C, Ci Ci Ci C, C, 


Block 2 


In each of the factor combinations, N orchid seeds were placed to germinate for a 
period of time. Let у; be the number of seeds germinated at the ith level of factor А, 
at the jth level of factor B, and at the kth level of factor C. Since the observations are 
made at the sub-subplot level, conditional on the structural effects of the design, 
these observations have a variance associated with the subplot. Therefore, the 
statistical model for this experiment is given below: 
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Table 6.18 Number of seeds Block A B С Y N 

that germinated (ур) i 

of he factor 4. ig | І : 13 = 
2 1 1 1 10 86 
2 1 1 2 19 32 
1 1 2 1 26 125 
2 1 2 1 21 62 
1 1 2 2 14 81 
2 1 2 2 12 21 
1 2 1 1 10 92 
2 2 1 1 12 108 
1 2 1 2 30 4 
2 2 1 2 32 33 
1 2 2 1 37 91 
2 2 2 1 30 42 
2 2 2, 2 37 44 
1 3 1 1 18 52 
2 3 1 1 18 73 
1 3 1 2 23 108 
1 3 2 1 24 106 
2 3 2 1 27 92 
1 3 2 2 37 64 
2 3 2 2 37 97 


Distribution: y;;u | rj, (ға), (raf); Binomial(N;;, уж) 

rı ~ N (0,67), (га), ~ N (0, ов), (rap) ig ~ М(0,62,) 

Linear predictor: 

Пак = n + а; + ri + (тои + B; + (aP); + (rap); + ук (Qa + Gy) + (аду) 

where blocks (71), blocks х A ((ra),)), and blocks x A x B ((raf),;) are assumed to 
contribute to the variation such that r;~ М(0,62), (та), ~ N (0,62, 4), 
(ra); ~ N (0, оь), respectively, and єк experimental errors are distributed 
as N(0, o°). This model uses the linear predictor Пак to estimate the mean of the 
observations И. 

Link function: logit(z;x) = "lia 


Table 6.18 below shows the data obtained from this experiment. 

Table 6.19 presents the analysis of variance and shows the sources of variation 
and degrees of freedom for this experimental design. 

The following SAS GLIMMIX program allows a GLMM with a split-split plot 
structure to be fitted in an RCBD with a binomial response. 
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Table 6.19 Sources of variation and degrees of freedom for the randomized block design with an 
arrangement of treatments under the split-split-plot structure 


Sources of variation Degrees of freedom 

Blocks Ւ-1Հ2-1Հ1 

Еасїог А а—-1=3—1=2 
Error,(Bloque*A) (л = (а= 1) = 2 

Factor В Ե-1Հ2-1Հ1 

A* B (a — D(b —1) 22 

Error (A*B(Bloque)) a(b—1)(r—1)23x1x123 
Factor C (-І)-2-іІ-і 

А“С (3—0D02-1)22 

B*C (b— 1(c—1)—1 

А*В* Ը (a — D(b — 1(Շ- 1) = 2 

Error ab(c — G = 1) 23x2x1x126 
Total ғхахьхс-і1-2х3х2х2-1-?23 


proc GLIMMIX data=germ nobound method=laplace; 
class Block АВС; 

model Y/N = A|B|C/dist=binomial link=logit; 
random block block*A block*A block*A*B; 

lsmeans A|B|C/lines ilink; 

run; 


Part of the output is shown in Table 6.20. The value of the conditional statistic 
Pearson chi — square/DF — 1.81 (part (a)) indicates that there is an overdispersion in 
the dataset since these values are greater than 1. The estimated variance components 
tabulated in part (b) correspond to blocks, blocks x factor A, and blocks x fac- 
tor A x factor B, which are o2 — 0.0752, e. — 0.088, and оь = 0.0425, respec- 
tively. The type III tests of fixed effects are shown in part (с). Here, we see that 
the test of equality of treatments is not significant for factors А and B and the 
interaction AB (A, P = 0.1917, B, P = 0.0897; AB, P = 0.6262), whereas for factor 
C and the interactions АС, ВС, and АВС, it is significant at a level of 5%. 

Since there is overdispersion in the dataset, the binomial distribution does not 
provide a good fit for the dataset (Pearson s chi — square/DF = 1.81). An alternative 
to model this dataset could be the beta distribution. Under this assumption, let the 
response variable be p; = NS the proportion of seeds that germinated, then р, is 
assumed to have a beta distribution rather than a binomial distribution for the success 
count у; out of a total of М, Bernoulli trials. 

The components of the model are listed below: 


Distribution: рук | ro (таль (rap); ~ Вега(луд, ф), with Փ as the scale parameter. 
ri ~ N (0,62), (ra), ~ М(0,63,), (raf); ~ М(0,62,) 

Linear predictor: 

Пак = N + а; + rí + (тои + B; + (aD); + (ғару + үк + (оу) + Gy) + (OPN к 
Link function: logit (jx) = logit( Tijk ) = ти 


1- Tijk 
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Table 6.20 Results of the analysis of variance of the RCBD in the split-split plot under the 
binomial distribution 


(a) Fit statistics for conditional distribution 


—2 Log L (y I r. effects) 146.19 
Pearson’s chi-square 43.49 
Pearson’s chi-square/DF 1.81 

(b) Covariance parameter estimates 

Cov Parm Estimate Standard error 
Bloque 0.07521 0.1180 
Bloque*A 0.08847 0.09319 
Bloque*A*B 0.02205 0.04258 

(c) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr > F 
A 2 2 4.22 0.1917 
B 1 3 6.12 0.0897 
A*B 2 3 0.55 0.6262 
C 1 6 65.73 0.0002 
A*C 2 6 11.68 0.0085 
B*C 1 6 29.38 0.0016 
А*В*С 2 6 31.69 0.0006 


The following SAS commands fit a GLMM оп a split-split plot in a randomized 
complete block design assuming a beta distribution for the response variable. 


proc glimmix data=germ nobound method=laplace; 

class BlockABC ; 

model p = A|B|C/dist=beta ; 

random block block*A block*A*B;/*intercept A /subject=block*/; 
lsmeans A|B|C/lines ilink; 

run; 


Part of the results is listed in Table 6.21 under a beta distribution. The value of the 
fit statistic for the conditional model tabulated in (a) (Pearson s chi — square/ 
DF = 1.01) indicates that overdispersion has been removed and that the 
beta distribution is a good model to fit the dataset. Part (b) shows the variance 
component estimates for blocks, blockxA, and blockxAxB 
(62 = — 0.157,62, = — 0.05558, and o2, = — 0.227, respectively) and the value 
of the estimated scale parameter (ф = 19.2789). According to Ше type III tests of 
fixed effects іп part (c), the main effect of factor C (P = 0.0128) and interaction 
AxBxC (P = 0.0424) are statistically significant at a level of 5%. 

The estimates of the interactions are shown in Table 6.22 on the model scale 
under the “Estimate” column and as probabilities on the data scale under the “Mean” 
column with its corresponding standard errors under the “Standard error mean” 
column. 
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Table 6.21 Results of the analysis of variance of the RCBD in the split-split plot structure under 
the beta distribution 


(a) Fit statistics for conditional distribution 


—2 Log L (p Ir. effects) —37.51 
Pearson’s chi-square 21.31 
Pearson’s chi-square/DF 1.01 

(b) Covariance parameter estimates 

Cov Parm Estimate Standard error 
Bloque —0.1570 . 

Bloque*A —0.05558 

Bloque*A*B —0.2270 , 

Scale 19.2789 5.8703 

(с) Type Ш tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
A 2 2 1.21 0.4521 
B 1 2 0.00 0.9687 
A*B 2 2 1.08 0.4799 
C 1 4 18.34 0.0128 
A*C 2 4 1.50 0.3257 
B*C 1 4 6.56 0.0626 
A*B*C 2 4 7.72 0.0424 


Table 6.22 Estimated least mean squares on the model scale (“Estimate” column) and the data 
scale (“Mean” column) 


A*B*C least squares means 


Standard Standard error 
Estimate | error DF Pr >l | Mean | mean 


—0.3769 | 0.3194 0.3034 | 0.4069 | 0.07709 
0.9506 | 0.3445 0.0509 | 0.7212 | 0.06927 
0.1721 | 0.3147 0.6135 | 0.5429 | 0.07810 
0.7010 | 0.3308 0.1014 | 0.6684 | 0.07331 

—0.6521 | 0.3296 0.1190 | 0.3425 | 0.07422 
2.9148 | 0.8071 0.0225 | 0.9486 | 0.03937 
0.7430 | 0.4699 0.1890 | 0.6776 | 0.1026 
0.4056 | 0.4515 0.4198 | 0.6000 | 0.1084 
0.2695 | 0.3161 0.4419 | 0.5670 | 0.07761 
0.2752 | 0.3163 0.4334 | 0.5684 | 0.07759 
0.1236 | 0.3143 0.7143 | 0.5309 | 0.07827 
1.1726 | 0.3614 0.0315 | 0.7636 | 0.06523 


чо чө шө ро юрю вн 
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Average germination rate 


A2 
Interaction 


Fig. 6.7 The average seed germination rate 


The simple effects of factors show that the best combination of factor levels was 
A2*B1*C2, showing the highest seed germination proportion followed by the 
combination of factors Al*B1*C2, A3*B2*C2, and lower proportion, which were 
observed in the combination of factors A1*B2*C2, A2*B2*C1 and A2*B2*C2 
(Fig. 6.7). Finally, the combination of the factor levels A2 x ВІ х СІ showed the 
lowest proportion of seed germination. 


6.6 Alternative Link Functions for Binomial Data 


In previous chapters, we used proc GLIMMIX with binomial data and, by default, it 
works with the link function “logit. However, in certain applications with binomial 
data, other link functions are acceptable, either because they make it easier to 
interpret or because for certain binomial datasets, the link function logit cannot 
accurately model the data and, as a result, produce biased (misleading) results. In this 
section, we consider two alternative link functions to the logit for binomial data: the 
link “probit | and the complementary log-log link. 

The probit model is also used to model dichotomous (Bernoulli) or binomial (sum 
of Bernoulli trials) responses. For this model, the link function, called the probit link, 
uses the inverse of the cumulative distribution function of a standard normal 
distribution to transform probabilities to the standard normal variable. That is, 
Ф Қа) = ть which implies that л; = Ф), where ®(Z) = f° _ Jue dt. 

The use of the probit regression model dates back to Bliss (1934). Bliss was 
interested in finding an effective pesticide to control insects that fed on grape leaves. 
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He discovered that the relationship between the response and a dose of pesticide was 
sigmoid, and he applied the probit link function to transform the dose-response 
curve from a sigmoid to a linear relationship. 

The complementary function log — log defined as и; = log (- log (1 — z), 
whose inverse is z; = 1—e~°", is useful for data in which most of the probabilities 
are near zero or near one. For small values of z;, the log-log transformation produces 
results highly similar to those produced when using a logit link. As the probability 
increases, the transformation approaches infinity more slowly than the probit or logit 
model. 


6.6.1 Probit Link: A Split-Split Plot in an RCBD 
with a Binomial Response 


This example takes the dataset of the split-split plot in an RCBD (Exercise 6.8.5). In 
this example, the data were modeled using the function “logit.” In this exercise, we 
will fit the dataset using the link function “probit, ` and we will compare and contrast 
the results using a logit link. The components of the GLMM are identical to those in 
Example 6.5, except for the link function. That is, we replace: 


Tijk 


Link function: logit (zi) = logit(; а.) = nije by Ó (ոյժ = т 


The following GLIMMIX syntax implements the fitting of the binomial data 
using the link function probit. 


proc glimmix data=germ nobound method=laplace; 
class Block ABC; 

model Y/N = A|B|C/link=probit; 

random block block*A block*A*B; 

lsmeans A|B|C/lines ilink; 

run; 


Table 6.23 shows part of the results under the binomial distribution with the 
"probit" link function. In parts (a) and (b), we see the mean squared error and 
variance component estimates for blocks, whole plot, subplot, and sub-subplot, 
where it can be observed that these values are positive and not negative, as the 
ones obtained with the link function “logit. Since the variance components are 
positive, this analysis makes more sense than the one based on the logit link. 

The type III tests of fixed effects are tabulated in part (c) of Table 6.23; the main 
effects of factors A and B and the interactions A*B, A*C, and B*C are not significant 
in both link functions, whereas the main effect of factor C and the interaction A*B*C 
are statistically significant under the “probit” link. 

The estimated probabilities (zx) and their respective standard errors are 
presented in Table 6.24 for each of the combinations of the three factors, which 
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Table 6.23 Results of the analysis of variance of the RCBD in the split-split plot structure under 
the binomial distribution using the “probit” link 


(a) Fit statistics for conditional distribution 
—2 Log L (y I r. effects) 
Pearson’s chi-square 


146.43 
43.01 


Pearson’s chi-square/DF (CME = 5?) 1.09 
(b) Соуагіапсе parameter estimates 
Cov Parm Estimate Standard error 

0.02411 0.03707 
Block*A (62.4) 0.02128 0.02830 
Block*A*B (бк дв) 0.01617 0.01896 
(с) Type III tests of fixed effects 

Probit Logit 

Effect Num DF Den DF F-value Pr> F Pr> F 
A 2 2 5.49 0.1541 0.4521 
B 1 3 4.17 0.1339 0.9687 
A*B 2 3 0.36 0.7226 0.4799 
C 1 6 67.13 0.0002 0.0128 
A*C 2 6 12.34 0.0075 0.3257 
B*C 1 6 29.16 0.0017 0.0626 
А*В*С 2 6 33.93 0.0005 0.0424 


аге very similar in both link functions. However, the average standard error 
is slightly higher with the “logit” link function (standar.error.meaniogit = 0.0711) 
compared to the “probit” link (standar.errormeanpropit = 0.0693). 


6.6.2 Complementary Log-Log Link Function: A Split Plot 
in an RCBD with a Binomial Response 


Researchers studied three different micro-minerals (A, B, and C) on the attachment 
of explants of a commercial culture. In this vein, micro-mineral A was tested at three 
levels (1 = 1,2, and 3), and micro-minerals В and C at two levels (j,k = 1,2 апа). 
The combination of the different levels yielded a total of 12 combinations. Since the 
researchers wanted to study factor C with greater precision, a split-plot treatment 
structure was designed in which micro-minerals A and B were placed in the whole 
plot (a large plot) and micro-mineral C in the subplot (a small plot). Treatment factor 
combinations were placed in an RCBD manner (r = 1,2). The outcome of interest 
was the number of live plants (у) out of the total number of plants growing in the 
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Table 6.24 Means and standard errors using the probit and logit link functions 


A*B*C least squares means 


Probit Logit 
A B C Mean Standard error mean Mean Standard error mean 
1 1 1 0.1543 0.05050 0.1494 0.04796 
1 1 2 0.3723 0.08296 0.3780 0.08767 
1 2 1 0.2724 0.06746 0.2694 0.06896 
1 2 2 0.2954 0.07798 0.2953 0.08053 
2 1 1 0.1023 0.03805 0.09593 0.03409 
2 1 2 0.8255 0.06338 0.8292 0.06135 
2 2 1 0.5684 0.08306 0.5703 0.08845 
2 2 2 0.5529 0.08327 0.5530 0.08847 
3 1 1 0.2844 0.07196 0.2844 0.07418 
3 1 2 0.2751 0.06868 0.2733 0.07041 
3 2 1 0.2568 0.06452 0.2563 0.06589 
3 2 2 0.4612 0.08017 0.4608 0.08553 


unit (л). The data can be referred to in the Appendix (Data: Commercial crop 
explant attachment). 
The GLMM for this experiment is described below (log-log data): 


Distribution: уз | т, r(ap)jr-Binomial(N;;,, лук) 

ri ~ N (0.67). r(aB) iy ~ N (0.074). 

Linear predictor: rj = И + rı + а; + Bj (a) + КаВ)и + ук+ (ау) + (Ву) + COBY inxs 
i + В; + («Ву + (օթ) + Yk + («үк + (Будь + («рү where blocks (ту) and blocks 
x (A x В) ((t(aB))i) are assumed to contribute to the variation such that ті ~ 
№ (0,02) and r(ap) ~ М(0,62,), respectively. 


rab 
Link function: log — log (луы) = Туы 


The following GLIMMIX code adjusts the binomial proportions with a comple- 
mentary link function log — log in an RCBD manner. 


proc glimmix data=spp nobound method=laplace; 
class block À B C; 

model y/n = A|B|C/1ink=cc11; 

random block block (A*B) ; 

lsmeans A|B|C/lines ilink; 

run; 


The “link = ccll” option specifies that “proc GLIMMIX" will fit the model using 
the complementary (log — log) link function. The “Ismeans AIBIC/lines ilink” 
command calls for estimation of the linear predictors 7յչյչ, whereas the "lines" and 
“ilink” options provide the comparison between the linear predictors and their 
inverse. Part of the output is shown below. Table 6.25 shows the variance compo- 
nent estimates of blocks and blocks (AxB) using alternative link functions. Under 
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Table 6.25 Variance component estimates using the same distribution but a different link function 


Covariance parameter estimates 


Log — log Logit Probit 
Standard Standard Standard 
Cov Parm Estimate | error Estimate | error Estimate | error 
Block 0.05808 | 0.07112 0.08144 | 0.1042 0.02676 | 0.03494 
Block 0.05065 | 0.03121 0.09203 | 0.05754 0.03374 | 0.02111 
(A*B) 


Table 6.26 Type III tests of fixed effects using the same distribution but with a different link 


function 


Type Ш tests of fixed effects 


Log — log Logit Probit 
Effect Num DF |Den DF | F-value |Pr>F | F-value |Pr>F | F-value |Pr >F 
A 2 5 6.27 0.0318 8.17 0.0266 
B 1 5 4.85 0.1370 2.81 0.1543 
A*B 2 5 0.65 0.7693 0.24 0.7971 
C 1 6 68.84 0.0002 | 66.70 0.0002 
А*С 2 6 11.94 0.0088 |1212 0.0078 
B*C 1 6 27.51 0.0019 | 28.88 0.0017 |2877 0.0017 
А*В*С |2 6 32.44 0.0006 |32.36 0.0006 |33.93 0.0005 
Table 6.27 Fit statistics Covariance parameter estimates 
using the same distribution but Log — log Logit Probit 
a different link function 
—2 Log likelihood 164.85 172.57 170.88 
AIC (smaller is better) 192.85 200.57 198.88 
AICC (smaller is better) 239.51 247.24 245.55 
BIC (smaller is better) 174.55 182.27 180.59 
CAIC (smaller is better) 188.55 196.27 194.59 
HQIC (smaller is better) 154.58 162.31 160.62 


the link “probit,” the variance components are smaller compared to those obtained 


with the link functions “log — log” and “logit.” 


The values of the hypothesis tests for the fixed effects, both main effects and 
interactions, are shown in Table 6.26. The three link functions behave similarly. 

One tool that might be useful in choosing which link function provides a better fit, 
or which best describes the variability of a dataset, is the model fit statistics. The fit 
statistics indicate that the model with the complementary “log — log” link function 
provides the best fit (Table 6.27). 

Table 6.28 shows the maximum likelihood estimators (ға) for each of the link 
functions and the combination of factor levels, and it can be verified that they 
provide very similar estimates. It is important to mention that the correct 
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Table 6.28 Means and standard errors using the same distribution but with a different link function 


A*B*C least squares means 


Log — log Logit Probit 
Standard error Standard error Standard error 

А |B C | Меап | mean Mean | mean Mean mean 

1 |1 1 0.1494 | 0.04259 0.1513 |0.04732 0.1547 0.05030 
1 |1 |2 |0.3776 |0.08554 0.3727 | 0.08510 0.3696 0.08223 
1 |2 |1 |02661 |0.06257 0.2706 |0.06744 0.2737 0.06718 
1 |2 |2 |0.3001 |0.07718 0.2993 | 0.07951 0.2980 0.07789 
2 |1 |1 |0.1020 |0.03079 0.1023 | 0.03451 0.1047 | 0.03829 
2 |1 2 | 08389 | 0.08212 0.8188 | 0.06189 0.8196 | 0.06375 
2 |2 |1 0.5558 | 0.09578 0.5733 | 0.08633 0.5700 0.08251 
2 |2 |2 |0.5578 | 0.09596 0.5560 | 0.08635 0.5546 | 0.08273 
3 |1 |1 10.2770 | 0.06780 0.2805 | 0.07192 0.2827 | 0.07131 
3 |1 |2 0.2782 | 0.06574 0.2779 | 0.06929 0.2778 0.06855 
3 |2 |1 |02555 |0.05987 0.2561 | 0.06416 0.2569 0.06410 
3 |2 |2 |0.4599 | 0.08735 0.4610 | 0.08331 0.4609 | 0.07965 


specification of the linear predictor as well as the distribution of the response variable 
are the most important elements for obtaining a good fit. 


6.7 Percentages 


In this section, we consider proportions that have been calculated from discrete 
counts, for example, the number of infected plants in treatment i of total N; plants 
that are likely to have a binomial distribution. This class of models allows the 
response to arise from different distributions and probabilities. 


6.7.1 RCBD: Dead Aphid Rate 


An experiment was designed to study the effect of conidial density on the transmis- 
sion of a fungus that attacks aphids. Aphid carcasses killed by the fungus, and from 
which the fungus released spores, were placed on bean plants at three densities 
(A = 1, B=5, or C = 10 carcasses per plant) to provide different doses of fungal 
conidia. Densities were assigned to individual bean plants in a completely random- 
ized design with six replicates. A total of 20 live uninfected (N) aphids were placed 
on each plant with a ladybug that was allowed to forage (feed on the bean plants) to 
facilitate the transfer of conidia between the carcasses and the live aphids. For each 
plant, the number of aphids infected with the fungus was counted (л) and the 
proportion of aphids infected with the fungus was calculated 7 days after the 
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Table 6.29 Proportion of Plant Density ру 
infested aphids 
1 С 0.34299 
2 А 0.16659 
3 В 0.47004 
4 C 0.62481 
5 B 0.21926 
6 B 0.16659 
7 C 0.47502 
8 C 0.52747 
9 A 0.41581 
10 В 0.42556 
11 А 0.19466 
12 А 0.34299 
13 C 0.677 
14 C 0.76674 
15 A 0.13124 
16 B 0.58419 
17 B 0.38225 
18 A 0.28905 
Table 6.30 Sources of varia- Sources of variation Degrees of freedom 
tion and degrees of freedom ց oa — — 
Trt 1-1-2 
Error Ir — 1) = 15 
Total txr—l1-c17 


inoculum was placed. The results shown below correspond to the proportion of 
infected aphids calculated at each of the inoculum concentrations (ру = ni/N; 
N — 20) to each of the conidial concentrations (density) tested (Table 6.29). 

The sources of variation and degrees of freedom for this experiment are shown in 
Table 6.30. 

The components of the GLMM having a beta response are listed below: 


Distributions: pj; | density(plant) j^ Beta(z;;, $) 


density (plant) ;) ա N (o. И 


Linear predictor: y; = и + density; + density(plant) у); i = 1, 2, 3; ј = 156 
Link function: log (=) =logit(z;;) =n; 

The following GLIMMIX program fits a GLMM in a completely randomized 
design with a beta distribution. Here, density is conc_ino. 


proc glimmix data=thumbs nobound method=laplace; 
class plant conc ino; 

model p = сопс іпо /dist=beta link=logit; 

random conc ino(plant); 
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Table 6.31 Results of the (a) Fit statistics for conditional distribution 

analysis:of variance —2 Log L (P | r. effects) —24.13 
Pearson’s chi-square 18.45 
Pearson’s chi-square/DF 1.02 
(b) Covariance parameter estimates 
Cov Parm Estimate Standard error 
Conc_Ino (Planta) —0.1833 . 
Scale 12.9999 4.1954 
(c) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
Conc_Ino 2 15 8.25 0.0038 


Table 6.32 Means and standard errors on the model scale and the data scale 


Conc_Ino least squares means 


Conc_Ino | Estimate | Standard error | DF | value Pr > Ш | Mean Standard error mean 


A —1.0340 | 0.2438 15 | —4.24 | 0.0007 | 0.2623 | 0.04717 
B —0.5282 | 0.2246 15 | —2.35 | 0.0328 | 0.3709 | 0.05241 
C 0.2775 |0.2197 15 1.26 |0.2259 |0.5689 | 0.05388 


lsmeans conc_ino/lines ilink; 
run; 


Part of the results is shown in Table 6.31. The value of the conditional fit statistic 
in part (a), Pearsons chi — square/DF = 1.02, indicates that there is no 
overdispersion in the data and that the beta distribution is a good model for this 
dataset. The estimated variance of the plants’ nested inoculum density is 


Саноо = — 0.1833 and the estimated scale parameter is ф = 12.999; both аге 


tabulated in part (b). In part (с) of the same table, the type Ш tests of fixed effects are 
shown, indicating that the density (concentration) of the inoculum has a significant 
effect (P = 0.0038) on the proportion of infected aphids with the fungus. 

The values under the column “Estimates” are estimated mean proportions on the 
model scale, whereas the column “Mean” shows the estimated mean proportions on 
the data scale with their respective standard errors (Table 6.32). These estimates 
where obtained with the “Ismeans” and “ilink” option. 

Figure 6.8 shows a linear trend in the proportion of aphids infested as conidial 
density increases. Conidia densities A and B showed statistically equal proportions 
of infested aphids compared to density C. Finally, the highest proportion of infested 
aphids was observed at density C. 
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Fig. 6.8 Proportion of aphids infected at different conidia concentration densities 


6.7.2 RCBD: Percentage of Quality Malt 


An agro-industrial engineer is interested in studying the effect of germination time in 
minutes (48, 96, and 144) on the percentage of quality malt obtained from six 
sorghum varieties (sorghum bicolor): Gambella 1107, Macia, Meko, Red Swazi, 
Teshale, and 76T1#23 (Bekele et al. 2012). The percentage of quality malt (y) as a 
function of both factors is shown in Table 6.33. 

For this purpose, an RCBD was implemented with a treatment factorial structure 
(variety x germination time). The statistical model to analyze the dataset is the 
following: 


Distributions: yj; | rk Beta(zij, ф); i= 1, 7,6; k= 1, 2, 3 

re~ N (0, ам): where у; is the kth percentage of malt quality observed at the ith 
variety with the jth fermentation time. 

Linear predictor: rj, = и + ry + а, + f; + (ap); where и is the overall mean, а; is the 
fixed effect due to variety i, J; is the fixed effect due to germination time j, 
апа (afp); is the interaction effect between variety and germination time. 

Link function: logit(z;;) = Пк 


Table 6.34 shows the sources of variation and degrees of freedom for this 
experiment. 
The following GLIMMIX commands adjust a GLMM with a beta response. 


proc glimmix data=malting nobound method=laplace; 
class var_sorghum ger_time block; 

model р = var_sorghum|ger_time/dist=beta link=logit; 
random block; 

lsmeans var_sorghum|ger_time/lines ilink; 

run; 
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Table 6.33 Percentage of quality malt as a function of both factors (variety and germination time) 
Variety Time Block y Variety Time Block y 
Gambella ТІ 1 Red Swazi 1 21 
Gambella ТІ 2 Кей Swazi 2 15.09 
Gambella ТІ 3 Red Swazi 3 24.84 
Macia TI 1 Teshale 1 25.42 
Macia ТІ 2 Teshale 2 26.86 
Macia ТІ 3 Teshale 3 26.64 
Meko ТІ 1 76 T1#23 1 23.69 
Меко ТІ 2 76 T1#23 2 20.71 
Meko ТІ 3 76 T1423 3 26.14 
Red Swazi TI 1 Gambella 1 12.45 
Red Swazi ТІ 2 Gambella 2 15.34 
Red Swazi ТІ 3 Gambella 3 17.32 
Teshale ТІ 1 Macia 1 8.51 
Teshale ТІ 2 Macia 2 8.15 
Teshale ТІ 3 Macia 3 13.07 
76Т1#23 ТІ 1 Meko 1 22.09 
76T1#23 Tl 2 Meko 2 24.11 
76T1#23 Tl 3 Meko 3 24.47 
Gambella T2 1 Red Swazi 1 20.81 
Gambella T2 2 Red Swazi 2 16.05 
Gambella T2 3 Red Swazi 3 23.7 
Macia T2 1 Teshale 1 26.42 
Macia Т2 2 Teshale 2 27.07 
Macia T2 3 Teshale 3 28.01 
Meko T2 1 76 T1423 1 24.18 
Meko T2 2 76 T1423 2 19.58 
Meko T2 3 76 T1423 3 25.74 
Table 6.34 Sources of varia- Sources of variation Degrees of freedom 
tion and degrees of freedom Blocks „—1=—3—1—2 
Variety а-і-6-і-5 
Time_Germination b-1=3-1=2 
Variety*germ time (a — 1)(b — 1) = 10 
Error (ab — 1)(r — 1) = 17 x 2 = 34 
Total rxaxb—1=54-1=53 


Part of the results of the above program is shown in Table 6.35. In part (a), the 


2 


value of Pearson’s chi-square/DF is tabulated (5 = 0.92), which indicates that the 


beta distribution is a good distribution for modeling malt percentage since the r-value 
of Pearson’s chi-square/DF is close to 1. The estimated variance due to blocks is 
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Table 6.35 Results of the (a) Fit statistics for conditional distribution 


aay of vans M “2 Tog Lp ee cus 
Pearson's chi-square 49.66 
Pearson's chi-square/DF 0.92 
(b) Covariance parameter estimates 
Cov Parm Estimate Standard error 
Block 0.01210 0.01055 
Scale 431.54 85.4922 


(c) Type III tests of fixed effects 
Num Den F- 


Effect DF DF value |Pr >F 

Var_sorghum 5 34 106.51 | <0.0001 
Ger_time 2 34 0.26 0.7722 
Var_sorghum*ger_time | 10 34 1.08 0.4041 


Table 6.36 Means and standard errors on the model scale and the data scale for sorghum varieties 


Var_sorghum least squares means 


Standard Standard error 

Var_sorghum | Estimate | error DF |і-уаше |Pr>I | Mean mean 

76 T1#23 —1.2011 | 0.07401 34 | —16.23 | <0.0001 | 0.2313 | 0.01316 
Gambella —1.8898 | 0.07929 34 | —23.83 | <0.0001 | 0.1313 | 0.009042 
Macia —2.2067 | 0.08295 34 | —26.60 | <0.0001 | 0.09915 | 0.007409 
Meko —1.1201 | 0.07364 34 |-1521 | <0.0001 | 0.2460 | 0.01366 

Red Swazi —1.3685 | 0.07493 34 | —18.26 | <0.0001 | 0.2029 | 0.01212 
Teshale —1.0025 | 0.07314 34 |—13.71 | «0.0001 | 0.2685 | 0.01436 


быоск = 0.012 and the estimated scale parameter is $ — 431 (part (b)), whereas the 
type III fixed effects hypothesis tests in part (c) show that sorghum variety has a 
significant effect on malt quality percentage (P — 0.0001). 

The least squares means on the model scale and the data scale for the factor 
variety are listed under the columns "Estimate" and “Mean” with their respective 
standard errors "Standard error" in Table 6.36. 

Figure 6.9 shows that Teshale produced the highest average malt percentage 
(0.2685 + 0.01436), followed by the varieties 76 T1423 and Meco 
(0.2313 + 0.01316,0.246 + 0.01366), whereas the variety Macia produced the 
lowest malt percentage (0.09915 + 0.0074). 


6.7.3 A Split Plot in an RCBD: Cockroach Mortality 
(Blattella germanica) 


An entomologist is interested in testing six isolates of insect pathogenic fungi: five 
obtained from different hosts and one already known isolate (Control) of a fungus 
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Fig. 6.9 Percentage of quality malt of bicolor sorghum varieties 


Table 6.37 Analysis of variance with sources of variation and degrees of freedom for this 


experiment 


Sources of variation 


Degrees of freedom 


Blocks r—sl=2-1=1 

Isolation a-1=6-1=5 

Block (insulation) ar—1)=6 

Age b-1=3-1=2 

Isolation*age (a — 1)b-1)=5x2=10 

Error (a — D(b —1)(r—1)22x5x1—10 
Total rxaxb—1—2x6x3—1-35 


with potential for biological control of a particular species of cockroaches. To do so, 
the entomologist decides to test these fungal isolates on three different insect ages 
(agel = El, age2 = E2, and age3 = ЕЗ). Each of the isolates was placed in a Petri 
dish with 10 insects of a specific age. Each set (isolate—age) was randomly assigned 


to two blocks (Appendix: Data: Cockroaches). 


The analysis of variance table (Table 6.37) with the sources of variation and 
degrees of freedom for this experiment is presented below. The response variable 
(percentage mortality) for this experiment is assumed to have a beta distribution. 

The components that describe the model of this experiment are listed below: 


Distributions: yj; | ro r(a)y;y-Beta(zig, d); і = 1, `7, 6; j = 1, 2,3; k = 1,2. 


к= (0,02). қайыс N (0.2%) 


Linear predictor: пу = И + rk + а; + Кацо + В; + (ap); 


Link function: Јов (лк) = Пк 
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Table 6.38 Results of the 


Á қ (a) Fit statistics for conditional distribution 
analysis of variance of the 


RCBD with a factorial struc- —2 Log L (y | r. effects) =! 
шге in treatments Pearson’s chi-square 34.02 
Pearson’s chi-square/DF 1.00 
(b) Covariance parameter estimates 
Cov Parm Subject Estimate Standard error 
Aislamiento Block —0.03125 қ 
Scale 24.1882 5.7925 
(c) Type III tests of fixed effects 
Effect F-value |Рг>Е 
Isolation 16.48 0.0019 
Age 30.01 <0.0001 
Isolation*age 4.83 0.0102 


The following GLIMMIX commands adjust a GLMM with a beta response. 


proc glimmix nobound method=laplace; 

class block Isolation Age; 

model y= Isolation|Age/dist=beta link=logit; 

random Isolation/subject=block; 

lsmeans Insulation |Age/slice=Insulation lines ilink; 
run; 


Some of the outputs are listed below (Table 6.38). The conditional statistic 
Pearson s chi — square/DF = 1 indicates that the distribution used is appropriate 
for these datasets (part (a)). The variance component estimates are tabulated in part 
(b), and, for blocks, the estimate is 52 = — 0.03125 and Ше estimated scale рагат- 
eter is ф = 24.1882. Тһе hypothesis test is іп part (с) with type III fixed effects of 
equality of means for type of isolation, age of the insect, and the interaction between 
both factors. These outputs indicate that they have a significant effect on insect 
mortality. 

We see the expected proportions with their respective standard errors of both 
factors on the data scale under the “Mean” column (Tables 6.39 and 6.40). These 
values arise by applying the inverse link to estimates under “Estimate” on the model 
scale. Table 6.39 shows the estimated average mortality probabilities for the isolates; 
for example, for isolate A1, applying the inverse link to the linear predictor estimate 
դլ =0.1722 we get ոլ = 1/1 + e 9.172 = 0.5429. In this manner, we see that the 
expected proportions for isolates 2 and 4 аге 72 = 0.6555 and лд = 0.5762, respec- 
tively, whereas for the control Zcontrot. = 0.1157. 

Regarding the age of the insect (Table 6.40), the expected average probability of 
mortality was higher at age three (adults) with a higher mortality rate л з = 0.6435, 
whereas insects at age two (E2) had a higher resistance to the isolations, showing a 
mortality of Z = 0.2598. 

In general, fungal isolates Al, A2, A3, and A4 showed an average mortality of 
more than 75% for adult insects (ЕЗ), whereas isolates Al, A2, and A5 showed а 
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Table 6.39 Means and standard errors on the model scale and the data scale for isolation 


Isolate least squares means 


Isolate | Estimate | Standard error | DF |1-уаше |Pr Id | Меап Standard error mean 
Al 0.1722 | 0.1859 6 093 | 0.3900 | 0.5429 | 0.04614 
A2 0.6442 | 0.2100 6 3.07 | 0.0220 | 0.6557 | 0.04740 
A3 —0.1489 | 0.1952 6 —0.76 | 0.4746 | 0.4629 | 0.04853 
A4 0.3073 | 0.2088 6 1.47 |0.1915 | 0.5762 | 0.05098 
А5 -0.2023 | 0.1806 6 —1.12 |0.3053 | 0.4496 | 0.04468 
Control | —2.0339 | 0.2418 6 —8.41 |0.0002 | 0.1157 | 0.02473 


Table 6.40 Means and standard errors on the model scale and the data scale for insect age 


Age least squares means 


Age | Estimate | Standard error DF |1-уаше | Pr > Id Mean Standard error mean 
El —0.1747 | 0.1310 —1.33 0.2120 |0.4564 |0.03251 
Е2 - 1.0468 | 0.1374 —1.62 |<0.0001 | 0.2598 | 0.02643 
ЕЗ 0.5908 | 0.1634 3.61 0.0047 | 0.6435 | 0.03749 


mortality rate of around 65% for cockroaches of age ЕІ (juvenile insects). On the 
other hand, all isolates showed lower lethal effectiveness on insects of age E2 
(Fig. 6.10). 


6.7.4 А Split-Plot Design іп an RCBD: Percentage Disease 
Inhibition 


A plant pathologist wishes to compare the response of two plant varieties to different 
doses/amounts of a pesticide formulated to protect plants against a disease. Five 
racks (blocks) were chosen to account for local variation within the greenhouse. 
Each rack was divided into four sections or rooms and were randomly assigned one 
of four pesticide levels to each rack. The four pesticide levels were 1, 2, 4, and 8 mg/ 
L. One plant of each variety was placed in each section of the rack. Of the two plant 
varieties, one variety was susceptible, labeled S, and the other variety was resistant, 
labeled R (Table 6.41). The response variable (y) is the percentage of disease 
inhibition in the plant. 

The sources of variation and degrees of freedom for this experiment are shown in 
Table 6.42. 

Following the same reasoning used in the examples above, the components of the 
GLMM with a beta response that models the observed disease inhibition proportion 
(pij) under dose i with variety j in block К are listed as follows:. 


Distributions: уру | ro (то) .~Beta(aj,, d); і = 1,,4;}=1,2;К=1,,5 
ry ~ М(0,62), (ға), ~ М(0,62,) 
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Fig. 6.10 Cockroach mortality percentage 


Table 6.41 Percentage of inhibition 


Block Variety Dose y Block Variety Dose y 

1 R 1 15.7 1 S 1 19.8 
2 R 1 23.1 2 S 1 17.8 
3 R 1 15.9 3 S 1 13.2 
4 R 1 20.8 4 S 1 14.8 
5 R 1 24.5 5 S 1 19.7 
1 R 2 25.1 1 S 2 21.2 
2 R 2 29.2 2 S 2 29.3 
3 R 2 29.7 3 S 2 26 

4 R 2 28.6 4 S 2 27.5 
5 R 2 26.6 5 S 2 22 

1 R 4 27.9 1 5 4 29.3 
2 R 4 29.7 2 S 4 27.2 
3 R 4 24 3 S 4 26 

4 R 4 29.7 4 S 4 31.5 
5 R 4 29.6 5 S 4 27.9 
1 R 8 23.8 1 S 8 22.8 
2 R 8 31.2 2 S 8 33 

3 R 8 21.8 3 S 8 25.2 
4 R 8 23.3 4 S 8 27.2 
5 R 8 23.9 5 5 8 20.8 
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Table 6.42 Sources of variation and degrees of freedom 


Sources of variation Degrees of freedom 

Blocks Ի-1Հ5- 1-4 

Dose a—-1=4-1=3 
Error,(Bloque*Dose) (r— l)(a — 1) = 12 

Variety b-1=2-1=1 

Dose* variety (a — D(b — 1) 23 

Error, a(b — 1)r—-1)=4x1x4=16 
Total rxaxb—1=5x4x2-1=39 


Linear predictor: g; = И + ry + a; + (го) + B; + (а); where r, is the random block 
effect, a; is the fixed dose effect, 2; is the fixed variety effect, (ra), is the random 
effect due to block by dose interaction, and (af); is the interaction of fixed effects 
due to dose variety. 

Link function: logit(z;;) = nix 


The following GLIMMIX commands adjust a GLMM. 


proc glimmix nobound method=laplace; 

class Variety dose block; 

model y = dose variety dose*variety /dist=beta link=logit; 
random Block Block*dose; 

contrast 'Linear dose' dose -3 -1 1 3; 

contrast 'Quadratic dose' dose 1 -1 -1 -1 1; 

contrast 'dose Cubic' dose -1 3 -3 1; 

lsmeans variety|dose / slice= (variety dose) lines ilink; 
ods output lsmeans-—dose means; 

run; 


The “contrast” command in the program can perform a hypothesis testing to see 
what trend (linear, quadratic, or cubic) the “dose” factor has on the percentage of 
disease inhibition. Part of the output is shown in Table 6.43. The value of the 
conditional goodness-of-fit statistic Pearson s chi — square/DF— 0.59 indicates 
that we have no evidence of overdispersion, and, therefore, the beta distribution is 
adequate to model this dataset (part (a)). The variance component estimates in part 
(b) for block and block x dose are 52 = 0.004898 and 52. dose = 0.002372, respec- 
tively. Finally, the F-value provides sufficient statistical evidence of the effect of 
dose on disease decline in plants (P = 0.0001), whereas the effect of variety and dose 
x variety do not provide sufficient evidence. 

Table 6.44 shows the polynomial contrasts for the effect of “dose,” which 
indicate that there is a significant quadratic effect on the percentage of disease 
inhibition. 

The inhibition percentage has almost a linear trend as the dose increases from 1 to 
4 ml/L in both varieties, but when the dose is higher than 4 ml/L, the inhibition of the 
disease decreases in both varieties (Fig. 6.11). 
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Table 6.43 Results of the 


: à (a) Fit statistics for conditional distribution 
analysis of variance 


—2 Log L (y I r. effects) —184.32 
Pearson's chi-square 23.63 
Pearson's chi-square/DF 0.59 
(b) Covariance parameter estimates 
Cov Parm Estimate Standard error 
Block 0.004898 ; 
Block*dose 0.002372 0.007513 
Scale 205.52 67.7447 
(c) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
Dose 3 12 17.67 0.0001 
Variety 1 16 1.74 0.2057 
Dose*variety 3 16 122 0.3337 
Table 6.44 Polynomial Contrasts 
contrasts Label Num РЕ |DenDF |F-value |Pr>F 
Linear dose 1 12 25.48 0.0003 
Quadratic dose 1 12 30.93 0.0001 
Cubic dose 1 12 0.30 0.5948 


6.7.5 Randomized Complete Block Design with a Binomial 
Response with Multiple Variance Components 


The dataset corresponds to an experiment implemented by Madden and Hughes 
(1995) on the incidence of the disease caused by the fungus Plasmopara viticola on 
grape plants (Vitis labrusca). Six different treatments in a randomized block design 
(b = 3) were tested, where treatment 1 was the control, to study the disease with 
three grape plants (v = 3). On a single date in autumn, five sprouts were (r = 5) 
randomly selected from each of the three grape plants and the number of leaves with 
at least one mildew lesion was counted (m) out of a total n leaves. The number of 
leaves per shoot ranged from 7 to 21. The data for this experiment can be found in 
the Appendix (Data: Disease incidence on grape plants). 

The statistical model that could describe the incidence of disease in this experi- 
ment, if the response variable р were treated as a normal variable, would be as 
described below: 


Рун =N + zi + b; + (Бу), + (Буғ) ы + ճյա 
$132) S Т.2531 es 
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Fig. 6.11 Percentage of disease inhibition in both varieties 


where pj; is the ijkl proportion of diseased leaves, y is the intercept, z; is the fixed 
treatment effect i, b; is the random effect of blocks assuming b; ~ N (0, оты» (Бу); 


is the block-plant random effect assuming (bv), — N (0. Т! (буг) 
is the random effect due to  block-plant-sprouts assuming (буг) ~ 


2 К X š 2 
N (0. block x plant x — , and ғ; is the experimental error assuming £jju-N(0, o^). 


For the disease incidence data, the assumption of a normal distribution for р; is 
not recommended. A good starting point for the analysis is to assume that the 
observed number of diseased leaves in the sprouts (уж) follows a binomial distri- 
bution with parameter z;;u and ույ, the total number of leaves on the sprout. 

Therefore, the components of the GLMM with a binomial distribution in the 
response variable are as follows: 


Distribution: руы | bj, (BV) jx, (bvr);u ~ binomial(z;;u, Пуш) 


2 2 2 
b; ~N (0, Oblock) (BV) д “М (0. O block x ակ ° (bvr) н “МУ (0. Oblock x plant x — 


Linear predictor: դս = N + 7; + b; + (бу) + (БУ, 
Link function: logit(z;x) = Tua 


The following GLIMMIX syntax fits a GLMM with a binomial response. 


proc glimmix method=laplace nobound; 
class vrbt; 

model m/n = t /dist=bin; 

random intercept v v*r/subject=b; 
lsmeans t/lines ilink; 

run; 
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Table 6.45 Results of the (a) Fit statistics 
analysis of variance under the — 2 Log likelihood 


binomial distribution 
AIC (smaller is better) 
AICC (smaller is better) 
BIC (smaller is better) 
CAIC (smaller is better) 
HQIC (smaller is better) 


(b) Fit statistics for conditional distribution 


723.17 
741.17 
741.87 
733.06 
742.06 
724.87 


—2 Log L (m | r. effects) 665.02 
Pearson's chi-square 398.21 
Pearson's chi-square/DF 1.47 

(c) Covariance parameter estimates 

Cov Parm Subject Estimate Standard error 
Intercept b —0.00408 

V b 0.01917 

Մոլ Ե 0.1960 


(d) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr> F 
t 2 220 1837.99 <0.0001 


Part of the results based on the aforementioned model is shown in Table 6.45. By 
default, proc GLIMMIX provides the fit statistics useful for selecting the best model 
from a group of models (part (a)). 

In addition to accuracy considerations, the Laplace (or quadrature) analysis 
allows us to obtain the “conditional distribution fit statistics,” specifically 
Pearson s yd. Recall that this statistic helps assess the goodness of fit of the 
model. If the value of y?/df >> 1 is an indicator that there is overdispersion in the 
dataset, then this may be because the linear predictor is incomplete or the assumed 
distribution is not suitable (mis-specified) for this dataset. In part (b), we can see that 
the value of the conditional distribution statistic of Pearson s անը = 1.47. This value 
indicates that we have evidence of overdispersion. The уапапсе component esti- 
mates due to block, block x plant, and block x plant x sprout are tabulated in part (c), 
whereas the type Ш tests of fixed effects (part (d)) indicate that there is a significant 
difference (P < 0.0001) between treatments. 

Since there is overdispersion in the data in the binomial model, an alternative 
distribution is the beta distribution. The components of the GLMM are as follows: 


Distribution: р; | bj, (Бу) (bvr);u-beta(z;;, $); 


bj = N (0, боек) (BV) ~N (0. бек х dnt) > (буг) x мо, Oblock x plant x T 
Linear predictor: rjj; = N + z; + b; + (Бу) + (bvr);u 
Link function: logit(z;;;) = niu. 
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Table 6.46 Results of the (a) Fit statistics 

els of ыны uu 7877 
AIC (smaller is better) —211.10 
AICC (smaller is better) —209.30 
BIC (smaller is better) —220.11 
CAIC (smaller is better) —210.11 
HQIC (smaller is better) —229.22 


(b) Fit statistics for conditional distribution 
—2 Log L (m | r. effects) —231.10 
Pearson’s chi-square 136.55 


Pearson’s chi-square/DF 1.07 
(c) Covariance parameter estimates 


Cov Parm Subject Estimate Standard error 
Intercept fb |0 | 

V b —0.2215 

v*r b —0.1843 

Scale (ф) 9.8397 1.1926 


(d) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr > F 
t 2 220 1837.99 <0.0001 


The following SAS commands adjust an GLMM under a beta distribution. 


proc GLIMMIX method=laplace nobound; 
class vrbt; 

mode1 pct = t /dist=beta link=logit; 
random intercept v v*r/subject=b; 
lsmeans t/lines ilink; 

run; 


Some of the outputs are shown below. Table 6.46 shows that the values of the fit 
statistics, as well as the conditional distribution statistics (parts (a) and (b)), are much 
smaller than when the binomial distribution was used. 

This indicates that the beta distribution is more appropriate for the dataset, as 
the value of Pearson s statistic is Ха = 1.03, indicating that the problem of 
overdispersion was almost totally controlled. The variance component estimates as 
well as the estimated scale parameter (д) аге tabulated in part (c). Similar to the 
previous analysis, the type III tests of fixed effects indicate that there is a highly 
significant difference (part (d)) in treatments on the average proportion of leaves 
with fungal disease. 

The least mean squares (means) on the model scale (column “Estimate’’) and оп 
the data scale (column “Меап”) are tabulated in Table 6.47. The results indicate that 
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Table 6.47 Estimated means (least squares means) on the model scale and on the data scale 


Least squares means 


t |Estimate |Standarderor | ОЕ |t-value Pr > Id Mean Standard error mean 
1 0.7223 |0.09989 83 7.23 |<0.0001 0.6731 |0.02198 
2 |—1.7482 | 0.1543 83 —11.33 | <0.0001 |0.1483 | 0.01949 
3 | —2.0178 | 02214 83 —9.11 | <0.0001 0.1174 | 0.02294 
4 |—1.9358 | 0.1873 83 —10.34 | «0.0001 |0.1261 | 0.02064 
5 | —1.7887 | 0.2173 83 —8.23 | <0.0001 | 0.1432 | 0.02667 
6 |-1.5360 | 0.1665 83 —9.23 «0.0001 10.1771 | 0.02427 


Table 6.48 Mean compari- 


T grouping of t least squares means (a = 0.05) 
son (LSD method) ae 1 


LS means with Ше same letter аге not significantly different 


t Estimate 
1 0.7223 A 
6 —1.5360 B 
B 
2 —1.7482 B 
B 
5 —1.7887 B 
B 
4 —1.9358 B 
B 
3 —2.0178 B 


all proposed treatments in this study reduce the proportion of diseased leaves 
compared to the control treatment (t = 1). 

The mean comparison (LSD) obtained with the option “lines” indicates that the 
proportion of diseased leaves in treatment опе is statistically different from the rest 
of the treatments (Table 6.48). 


6.8 Exercises 


Exercise 6.8.1 Seeds of a particular crop were stored at four different temperatures 
(T3, T>, T4, and T4) under four different chemical concentrations (0, 0.1, 1.0, and 10). 
To study the effects of temperature and chemical concentration, a completely 
randomized experiment was conducted with a factorial treatment structure 4 x 4 
and four replicates. For each of the 64 experimental units, 50 seeds were placed in a 
dish and the number of seeds that germinated under standard conditions was 
recorded. Germination data were obtained from Mead et al. (1993, p. 325) 
(Table 6.49). 
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Table 6.49 Seed germination experiment results 


Chemical concentration 
Temperature 0 0.1 1.0 
Т, 9, 9. 3,7 13, 12, 14, 15 21, 23, 24, 27 40, 32, 43, 34 
Т» 19, 30, 21, 29 33, 32, 30, 26 43, 40, 37, 41 48, 48, 49, 48 
T3 7,1,2,2 1,2,4,4 8, 10,6,7 3,4,8,5 
Т, 4,9,3,7 13, 6, 15,7 16, 13, 18,19 13, 18, 11,16 


Table 6.50 Results of the apple sprouts experiment 


Density of inoculum Cultivate Block 1 Block 2 Block 3 Block 4 
200 Jonagold 5/1 5/2 5/1 5/0 
200 Golden delicious 5/1 5/0 5/0 5/0 
200 Jonathan 5/2 5/2 5/2 5/0 
1000 Jonagold 5/0 5/2 5/2 5/4 
1000 Golden delicious 5/0 5/0 5/2 5/0 
1000 Jonathan 5/4 5/4 5/4 5/0 
5000 Jonagold 5/5 5/5 5/4 5/5 
5000 Golden delicious 5/5 5/4 5/3 5/5 
5000 Jonathan 5/5 5/0 5/3 5/5 


The first number refers to the number of inoculations (n) and the second to the number of 
inoculations that developed the gangrenous sore (Y) 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for this 
experiment. 

(b) List all the components of the GLMM іп (а). 

(c) Analyze this dataset and summarize the relevant results. 


Exercise 6.8.2 Data were obtained from an experiment in which separate sprouts of 
apple trees were inoculated with macroconidia of the fungus Nectria galligena, 
which causes apple cancer (canker gangrene). The experimental factors were inoc- 
ulum density (three levels: 200, 1000, and 5000 macroconidia per ml) and variety 
(three levels: Jonagold, Golden Delicious, and Jonathan). The experiment was 
carried out in 4 randomized blocks with 12 plots. Each plot consisted of one sprout 
on which five inoculations were made. The numbers of successful inoculations per 
plot on day 17 after inoculation are shown in the table below (Table 6.50). 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for this 
experiment. 

(b) List all the components of the GLMM from part (a). 

(c) Analyze this dataset and summarize the relevant results. 

(d) Is there is an extra-variation in the dataset? What alternative distribution do you 
propose? Reanalyze the data and compare the results. 


Exercise 6.8.3 This experiment concerns the germination efficiencies of protoplasts 
obtained from plants of seven species of the genera Lycopersicon (tomato) and 
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Table 6.51 Protoplast germination experiment results 


Species | Isolation | 1 2 3 4 5 6 Я 8 9 10 
1 1 89 |63 10.5 

1 2 31 |27 41 

1 3 21 |19 1.4 1.5 

1 4 25 |29 2.6 2.6 2.6 26 |28 27 |28 |27 
2 1 02 |0.9 0.5 0.6 1.2 0.4 

2 2 18 |16 1.6 

2 3 66 |75 5.4 5.3 5 6.5 | 6.3 58 59 | 5.6 
3 1 18 |1,5 1.9 1.7 1.3 1.5 

3 2 15 |32 1.1 1.3 1.8 12 116 14 |12 |18 
3 3 2 23 2.8 2.6 3.2 22 |25 24 |28 |24 
4 1 11.4 |113 |144 |137 

4 2 29 |38 4.7 5.1 2.7 3.2 

4 3 23 |44 4.8 4.9 5.8 47 |56 42 |33 |45 
5 1 215 |25.5 |181 |222 

5 2 187 | 20 

E 3 11.5 |131 |115 |162 |101 |172 |16 | 10.5 

6 1 46 |34 2.7 3 4.1 3.1 

6 2 24 |24 2 2.5 3.6 32 |26 14 |25 |27 
6 3 1.6 | 1.1 1.6 1.3 1.6 1 0.8 13 |08 |22 
7 1 3 4 44 4.4 2.8 33 |45 2952 32 
7 2 25 |25 2,5 2.7 2.3 2.6 

7 3 26 | 2.7 29 2.7 2.7 2.6 

7 4 29 |3 3 3.1 


Solanum (potato). For each species, three or four protoplast isolates were used and, 
depending on the availability of the protoplasts, a variable number of plates was 
carried out. Per plate, approximately 105 protoplasts were placed in a Petri dish, and, 
after 4 weeks, the proportion of dividing protoplasts was recorded. The results in 
percentages are listed below (Table 6.51). 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for the 
experimental design of this study. 

(b) Write down a generalized linear mixed model base in (a), assuming a beta 
distribution on the response variable. 

(c) Implement an analysis of these data according to the linear predictor and model 
in part (b). Summarize the relevant results. 


Exercise 6.8.4 Тһе data in this example are the results of a triangle test for 12 raters 
tasting 10 pairs of coffee varieties (Table 6.52). The triangle test consisted of each 
rater drinking three cups, one of one variety and two of the other. Each rater had 
12 triangles for each pair of varieties, 2 for each of the following sequences: AAB, 
ABA, BAA, ABB, BAB, and BBA. The answer is the correct variety identification 
number appearing once. The experiment was conducted in two groups of six 
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Table 6.52 Triangle test (G = group, Eval = panelist, РАУ = variety pair, V_A = variety A; 


V B 


— variety B; Y — number of correct discriminations, n — number of trials) 


G |Eval 


12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 
12 


12 
10 


10 
10 
11 
11 
11 
11 


10 


11 


PdV |VA |VB 


G |Eval 


n 


Y 


PdV |VA УВ 


10 


10 
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Table 6.52 (continued) 


G |Eva |PdV ԽՃ УВ У тп G |Eval |PdV ԽՃ УВ У n 

1 5 3 9 6 4 2 9 6 6 |12 
Է իտ 4 6 5 ծ 2 6 5 10 |12 
1 |5 5 6 8 6 2 6 8 5 |12 
1 |5 6 5 8 7 2 5 8 10 12 
1 |5 7 7 8 8 2 7 8 8 |12 
ІШЕ 8 7 9 9 2 7 9 6 |12 
1 |5 9 7 5 9 2 7 5 9 |12 
1 |5 10 7 6 8 2 7 6 9 |12 
1 |6 1 8 9 3 2 1 8 9 6 |12 
1 |6 2 5 9 9 |12 |2 |12 2 5 9 7 |12 
1 |6 3 9 6 6 |12 |2 |12 3 9 6 7 |12 
1 |6 4 6 5 9 |12 |2. |12 4 6 5 7 |12 
1 |6 5 6 8 7 |12 |2 |12 5 6 8 8 |12 
1 |6 6 5 8 10 |12 |2 |12 6 5 8 11 |12 
1 6 7 7 8 7 |12 |2 |12 7 7 8 9 |12 
1 6 8 7 9 7 |12 2 |12 8 z 9 9 |12 
1 6 9 7 5 8 |12 |2 | 12 9 T 5 10 |12 
1 6 10 7 6 9 |12 |2 |12 10 7 6 9 |12 


evaluators, each with the aim of discriminating the abilities of the panelists for future 
evaluations. The data for this example are shown below: 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for this 
experiment. 

(b) List all the components of the GLMM according to part (a). 

(c) Analyze this dataset and summarize the relevant results. 

(d) Is there an extra-variation in the dataset? If so, what alternative distribution do 
you propose? Reanalyze the data and compare the results. 


Exercise 6.8.5 Several brewing techniques are used in the production of espresso 
coffee. Among them, the most widespread are bar machines and single-dose pods, 
designed in large numbers due to their commercial popularity. This experiment tries 
to compare the foaming rate (Y, in percentage) effects of three different brewing 
techniques on espresso quality (method 1 = bar machine (BM), method 2 = hyper- 
espresso method (HIP), and method 3 = I-espresso system (IT)). Nine replicates per 
method were carried out (Table 6.53). 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for the 
experimental design of this study. 

(b) Describe the generalized linear mixed model in (a), assuming a beta distribution. 

(c) Implement the analysis of these data according to the predictor and model in (b). 
Summarize the relevant results. 
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Table 6.S3 Experimental Method  |Index  |Method [Index | Method  |Index 

results of espresso coffee 1 36.64 2 70.84 3 56.19 
1 39.65 2 46.68 3 36.67 
1 37.74 2 73.19 3 35.35 
1 35.96 2 57.78 3 40.11 
1 38.52 2 48.61 3 33.52 
1 21.02 2 7277 3 37.12 
1 24.81 2 65.04 3 37.33 
1 34.18 2 62.53 3 32.68 
1 23.08 2 54.26 3 48.33 

Table 6.54 Results of wheat Treatments 

ination experiment in 

une of seeds that did 1 2 2 2 3 6 i 

not germinate out of 50 A 10 11 8 7 6 9 
В 10 3 9 3 11 
C 11 2 10 7 11 
D 1 6 4 13 7 10 10 


Exercise 6.8.6 The decision to adopt a particular scale for data involving small 
integers is not an easy one because any analysis must be — to some extent — as 
adequate as possible to obtain estimates with as little uncertainty as possible. As a 
simple example of this type of data, consider the following results from a potted 
wheat germination experiment (Table 6.54). 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for this 
experiment. 

(b) List all components of the GLMM in (a), assuming a binomial response variable. 

(c) Analyze this dataset and summarize the relevant results. 

(d) 18 there ап extra-variation in the dataset? If so, reanalyze the data with an 
alternative distribution. Summarize and compare your findings. 


Exercise 6.8.7 A greenhouse experiment was carried out to investigate how a 
disease spreads in two varieties of (agurkesyge) cucumber, which is supposed to 
depend on the climate and the amount of fertilizers used for the two varieties. The 
following data come from the Department of Plant Pathology. Two climates 
were used: (1) change to day temperature 3 hours before sunrise and (2) normal 
change to day temperature. Three amounts of fertilizer were applied, normal 
(2.0 units), high (3.5 units), and very high (4.0 units). The two varieties were 
Aminex and Dalibor. To have a better controlled experiment, the plants were 
“standardized” to equally have as many leaves, and, then (on day 0, for example), 
the plants were contaminated with the disease. Subsequently, 8 days after the plants 
were contaminated, the amount of infection (in percentage) was recorded. From the 
resulting infection curve, two measures were calculated (in a manner not specified 
here), namely, the rate of spread of the disease (%) and the level of infection at the 
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end of the disease period. The experiment was implemented in three blocks, each of 
which consisted of two sections. Each section consisted of three plots, which were 
divided into two subplots, each of which had six to eight plants. Thus, there were a 
total of 36 subplots. The results were recorded for each subplot. The experimental 
factors were randomly assigned to the different units as follows: two climates to the 
two sections within each block, three amounts of fertilizer to the three plots within 
each section, and, finally, the two varieties to the two subplots within each plot. The 
data are shown below (Table 6.55). 


(a) Write down a statistical model of this experiment. 

(b) List all the components of the GLMM in (a). 

(c) Write down the null and alternative hypotheses associated with this experiment. 

(d) Construct an ANOVA table indicating the sources of variation and degrees of 
freedom. 

(e) Analyze the rate of disease spread to investigate the effect of different factors. 

(f) Comment on the results obtained. 


Exercise 6.8.8 This example is an experiment to identify damage to the uterus in 
laboratory rodents after exposure to boric acid, a compound widely used in pesti- 
cides, pharmaceuticals, and other household products (Heindel et al. 1992). The 
study design included four doses of boric acid. The compound was administered to 
pregnant female mice during the first 17 days of gestation, and, then, the females 
were sacrificed and their litters examined. The table below presents the resulting 
trials for litters dying in utero (Y) of the total number of trials conducted (N) at each 
of the four doses tested: d, = O{control}, d, = 0.1, аз = 0.2, апа d, = 0.4 
(as percentage of boric acid in the diet) (Table 6.56). 


(a) Write down an ANOVA table (sources of variation, degrees of freedom) for this 
experiment. 

(b) List all the components of the GLMM in (a). 

(c) Analyze this dataset and summarize the relevant results. 

(d) Is there an extra-variation in the dataset? If so, what alternative distribution do 
you propose? Reanalyze the data and compare your findings. 
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Table 6.55 Greenhouse experiment results of cucumber varieties 


Block Section |Plot | Weather | Fertilizer | Variety Proportion (%) | Level 

1 1 1 2 2 Aminex | 48.8981 0.06915 
1 1 1 2 2 Dalibor | 42.2463 0.06595 
1 1 2 2 3.5 Aminex |48.2108 0.04679 
1 1 2 2 3.5 Dalibor | 41.6767 0.04881 
1 | 3 2 4 Атіпех |55.4369 0.04025 
1 1 3 2 4 Dalibor |40.9562 0.04859 
1 2 4 1 2 Aminex | 51.5573 0.09353 
1 2 4 1 2 Dalibor 36.7739 0.10353 
1 2 5 1 3.5 Aminex |47.9937 0.05327 
1 2 5 1 3.5 Dalibor |47.8723 0.04397 
1 2 6 1 4 Aminex  |57.9171 0.05225 
1 2 6 1 4 Dalibor 37.7185 0.09324 
1 3 7 2 2 Aminex |60.1747 0.04182 
2 3 7 2 2 Dalibor |45.6937 0.06983 
2 3 8 2 3.5 Aminex |51.0017 0.08863 
2 3 8 2: 3.5 Dalibor 52.2796 0.03622 
2 3 9 2 4 Aminex  |51.1251 0.05875 
2 3 9 2 4 Dalibor |48.7217 0.08169 
2 4 10 1 2 Aminex | 51.6001 0.07001 
2; 4 10 1 2 Dalibor 50.4463 0.09907 
2 4 11 1 3.5 Aminex |48.3387 0.05788 
2 4 11 1 3.5 Dalibor 38.6538 0.06834 
2 4 12 1 4 Aminex |51.3147 0.05695 
2 4 12 1 4 Dalibor 38.2488 0.07908 
3 5 13 1 2 Атіпех |49.6958 0.07218 
3 5 13 1 2; Dalibor 29.6786 0.11351 
3 5 14 1 3.5 Aminex |46.6692 0.08825 
3 5 14 1 3.5 Dalibor 36.5892 0.09107 
3 5 15 1 4 Aminex | 56.032 0.04532 
3 5 15 1 4 Dalibor 36.0955 0.08712 
3 6 16 2 2 Aminex |45.979 0.08882 
3 6 16 2 2 Dalibor 37.2489 0.12796 
3 6 17 2 3.5 Aminex |40.7277 0.06418 
3 6 17 2 3.5 Dalibor 38.4831 0.0854 

3 6 18 2 4 Aminex |44.5242 0.06215 
3 6 18 2 4 Dalibor 34.3907 0.09651 
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Table 6.56 Rodent experiment results 
Dose Y N Dose Y N Dose Y N Dose Y N 
0 0 15 0.1 0 6 0.2 1 12 0.4 12 12 
0 0 3 0.1 1 14 0.2 0 12 0.4 1 12 
0 1 9 0.1 1 12 0.2 0 11 0.4 0 13 
0 1 12 0.1 0 10 0.2 0 13 0.4 2 8 
0 1 13 0.1 2 14 0.2 0 12 0.4 2 12 
0 2 13 0.1 0 12 0.2 0 14 0.4 4 13 
0 0 16 0.1 0 14 0.2 4 15 0.4 0 13 
0 0 11 0.1 3 14 0.2 0 14 0.4 1 13 
0 1 11 0.1 0 10 0.2 0 12 0.4 0 12 
0 2 8 0.1 2 12 0.2 1 6 0.4 1 9 
0 0 14 0.1 3 13 0.2 2 13 0.4 3 9 
0 0 13 0.1 1 11 0.2 0 10 0.4 0 11 
0 3 14 0.1 1 11 0.2 1 14 0.4 1 14 
0 1 13 0.1 0 11 0.2 1 12 0.4 0 10 
0 0 8 0.1 0 13 0.2 0 10 0.4 3 12 
0 0 13 0.1 0 10 0.2 0 9 0.4 2 21 
0 2 14 0.1 1 12 0.2 1 12 0.4 3 10 
0 3 14 0.1 0 11 0.2 0 13 0.4 3 11 
0 0 11 0.1 2 10 0.2 1 14 0.4 1 11 
0 2 12 0.1 2 12 0.2 0 13 0.4 1 11 
0 0 15 0.1 2 15 0.2 0 14 0.4 8 14 
0 0 15 0.1 3 12 0.2 1 13 0.4 0 15 
0 2 14 0.1 1 12 0.2 2 12 0.4 2 13 
0 1 11 0.1 0 12 0.2 1 14 0.4 8 11 
0 1 16 0.1 1 12 0.2 0 13 0.4 4 12 
0 0 12 0.1 1 13 0.2 0 12 0.4 2 12 
0 0 14 0.1 1 15 0.2 1 2 

Appendix 
Data: Fleas 
Bioen SP Treat Rep Overvi Dead 
ВІ Daphnia Т1 1 10 0 
В1 Daphnia Т1 2 10 0 
ВІ Daphnia Т1 3 10 0 
ВІ Daphnia T2 1 10 0 
В1 Daphnia T2 2 10 0 
ВІ Daphnia T2 3 10 0 
ВІ Daphnia T3 1 9 1 
ВІ Daphnia T3 2 9 1 
В1 Daphnia T3 3 8 2 
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Data: Fleas 

Bioen SP Treat Rep Overvi Dead 
ВІ Daphnia T4 1 2 8 
ВІ Daphnia T4 2 2 8 
ВІ Daphnia T4 3 3 7 
ВІ Daphnia T5 1 0 10 
ВІ Daphnia T5 2 0 10 
В1 Daphnia T5 3 0 10 
В1 Daphnia T6 1 0 10 
ВІ Daphnia T6 2 0 10 
ВІ Daphnia T6 3 0 10 
B2 Daphnia Tl 1 10 0 
B2 Daphnia Tl 2 10 0 
B2 Daphnia Tl 3 10 0 
B2 Daphnia T2 1 10 0 
B2 Daphnia T2 2 10 0 
B2 Daphnia T2 3 10 0 
B2 Daphnia T3 1 9 1 
B2 Daphnia T3 2 9 1 
B2 Daphnia T3 3 9 1 
B2 Daphnia T4 1 2 8 
B2 Daphnia T4 2 2 8 
B2 Daphnia T4 3 2 8 
B2 Daphnia Т5 1 0 10 
B2 Daphnia T5 2 0 10 
B2 Daphnia T5 3 0 10 
B2 Daphnia T6 1 0 10 
B2 Daphnia T6 2; 0 10 
B2 Daphnia T6 3 0 10 
B3 Daphnia Т1 1 10 0 
B3 Daphnia Т1 2; 10 0 
B3 Daphnia Т1 3 10 0 
B3 Daphnia T2 1 10 0 
B3 Daphnia T2 2 10 0 
B3 Daphnia T2 3 10 0 
B3 Daphnia T3 1 8 2 
B3 Daphnia T3 2 9 1 
B3 Daphnia T3 3 9 1 
B3 Daphnia T4 1 3 7 
B3 Daphnia T4 2 2 8 
B3 Daphnia T4 3 2 8 
B3 Daphnia T5 1 0 10 
B3 Daphnia T5 2 0 10 
B3 Daphnia T5 3 0 10 
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Data: Fleas 

Bioen SP Treat Rep Overvi Dead 
B3 Daphnia T6 1 0 10 
B3 Daphnia T6 2 0 10 
B3 Daphnia T6 3 0 10 
ВІ Dubia ТІ 1 10 0 
ВІ Dubia ТІ 2 10 0 
ВІ Dubia ТІ 3 10 0 
ВІ Dubia T2 1 5 5 
ВІ Dubia T2 2 6 4 
ВІ Dubia T2 3 6 4 
ВІ Dubia T3 1 5 5 
ВІ Dubia T3 2 5 5 
ВІ Dubia T3 3 5 5 
ВІ Dubia T4 1 2 8 
ВІ Dubia T4 2 3 7 
ВІ Dubia T4 3 3 7 
ВІ Dubia T5 1 2 8 
ВІ Dubia T5 2 2 8 
ВІ Dubia T5 3 2 8 
ВІ Dubia Т6 1 0 10 
ВІ Dubia Т6 2 0 10 
ВІ Dubia Т6 3 0 10 
B2 Dubia Tl 1 10 0 
B2 Dubia Tl 2 10 0 
B2 Dubia ТІ 3 10 0 
B2 Dubia T2 1 7 3 
B2 Dubia T2 2: 5 5 
B2 Dubia T2 3 6 4 
B2 Dubia T3 1 5 5 
B2 Dubia T3 2 5 5 
B2 Dubia T3 3 5 5 
B2 Dubia T4 1 4 6 
B2 Dubia T4 2 4 6 
B2 Dubia T4 3 4 6 
B2 Dubia T5 1 2 8 
B2 Dubia T5 2 2 8 
B2 Dubia T5 3 2 8 
B2 Dubia Т6 1 0 10 
B2 Dubia Т6 2 0 10 
B2 Dubia Т6 3 0 10 
B3 Dubia ТІ 1 10 0 
B3 Dubia ТІ 2 10 0 
B3 Dubia ТІ 3 10 0 
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Data: Fleas 

Bioen SP Treat Rep Overvi Dead 
B3 Dubia T2 1 8 2 
B3 Dubia T2 2 8 2 
B3 Dubia T2 3 7 3 
B3 Dubia T3 1 5 5 
B3 Dubia T3 2 5 5 
B3 Dubia T3 3 6 4 
B3 Dubia T4 1 2 8 
B3 Dubia T4 2 3 7 
B3 Dubia T4 3 2 8 
B3 Dubia T5 1 3 7 
B3 Dubia T5 2 2. 8 
B3 Dubia T5 3 2 8 
B3 Dubia T6 1 0 10 
B3 Dubia T6 2 0 10 
B3 Dubia T6 3 0 10 
Data: Commercial crop explant detachment 

Block A B C y N 

1 1 1 1 15 73 
2 1 1 1 10 86 
1 1 1 2 17 69 
2 1 1 2 19 32 
1 1 2 1 26 125 
2 1 2 1 21 62 
1 1 2 2 14 81 
2 1 2 2 12 21 
1 2 1 1 10 92 
2 2, 1 1 12 108 
1 2 1 2 30 44 
2 2 1 2 32 33 
1 2 2 1 37 91 
2 2 2 1 30 42 
1 2 2 2 32 98 
2 2 2 2 37 44 
1 3 1 1 18 52 
2 3 1 1 18 73 
1 3 1 2 23 108 
2 3 1 2 21 55 
1 3 2 1 24 106 
2 3 2 1 27 92 
1 3 2 2 37 64 
2 3 2 2 37 97 
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Data: Cockroaches (E1 = np, E2 = ng, E3 = adult) 
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Data: Cockroaches (E1 = np, 


E2 = ng, E3 = adult) 


Bioassay Isolation Age Dead 
1 Bb9 np 5 
2 Bb9 np 8 
1 Bb9 ng 6 
2 Bb9 ng 2 
1 Bb9 a 7 
2 Bb9 a 5 
1 Bb10 np 8 
2 Bb10 np 6 
1 Bb10 ng 1 
2 Bb10 ng 4 
1 Bb10 a 3 
2 Bb10 a 4 
1 Bbl1 np 8 
2 Bbl1 np 7 
1 Bb11 ng 1 
2 Bb11 ng 3 
1 Bb11 a 6 
2 Bb11 a 8 
1 Bb12 np 8 
2 Bb12 np 9 
1 Bb12 ng 8 
2 Bb12 ng 9 
1 Bb12 a 7 
2 Bb12 a 6 
1 Bb13 np 6 
2 Bb13 np 3 
1 Bb13 ng 0 
2 Bb13 ng 1 
1 Bb13 a 5 
2 Bb13 a 6 
1 Bb14 np 10 
2 Bb14 np 5 
1 Bb14 ng 4 
2 Bb14 ng 2 
1 Bb14 a 6 
2 Bb14 a 6 
1 ՏԵԼ5 ոք 5 
2 Bb15 np 10 
1 ՏԵԼ5 ng 6 
2 Bb15 ng 1 
1 Bb15 a 4 
2 Bb15 a 5 
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Data: Cockroaches (E1 = np, E2 = ng, E3 = adult) 


Bioassay Isolation Age Dead 
1 ՏԵԼ6 ոք 5 
2 Bb16 np 7 
1 ՏԵԼ6 ng 3 
2 Bb16 ng 4 
1 Bb16 a 8 
2 Bb16 a 6 
1 Control np 1 
2 Control np 0 
1 Control ng 0 
2 Control ng 0 
1 Control a 0 
2 Control a 1 
Data: Disease incidence in grapevine plants (b = block, v = plant, г treatment, 


m = number of diseased leaves per shoot, and n = total number of leaves per shoot). 
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Data: Disease incidence in grapevine plants (b = block, v 
m = number of diseased leaves per shoot, and n = total number of leaves per shoot). 


plant, r 


treatment, 


b у r t M n 

1 1 5 3 0 9 
1 1 5 4 2 12 
1 1 5 5 0 10 
1 1 5 6 1 11 
1 2 1 1 7 9 
1 2 1 2 2 10 
1 2 1 3 0 10 
1 2 1 4 0 14 
1 2 1 5 1 12 
1 2 1 6 0 13 
1 2 2 1 6 12 
1 2 2 2 0 11 
1 2 2 3 1 13 
1 2 2 4 0 9 
1 2 2 5 2 11 
1 2 2 6 0 10 
1 2 3 1 6 7 
1 2 3 2 1 12 
1 2 3 3 0 9 
1 2 3 4 1 10 
1 2 3 5 0 14 
1 2 3 6 2 12 
1 2 4 1 7 13 
1 2 4 2 0 10 
1 2 4 3 0 10 
1 2 4 4 1 12 
1 2 4 5 0 9 
1 2 4 6 1 8 
1 2 5 1 11 15 
1 2 5 2 1 13 
1 2 5 3 0 14 
1 2 5 4 1 14 
1 2 5 5 0 11 
1 2 5 6 0 11 
1 3 1 1 5 11 
1 3 1 2 5 11 
1 3 1 3 0 15 
1 3 1 4 1 15 
1 3 1 5 0 8 
1 3 1 6 1 10 
1 3 2: 1 4 9 
1 3 2 2 1 15 
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treatment, 


shoot, t 


plant, r 
number of diseased leaves per shoot, and п = total number of leaves per shoot). 


Data: Disease incidence in grapevine plants (b = block, v 


m 


11 
13 


12 
12 
12 
14 
12 
12 
10 
13 


10 
10 


10 
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11 
11 
11 
11 


14 
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12 
10 
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12 
10 
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12 
11 
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Data: Disease incidence in grapevine plants (b = block, v 
m = number of diseased leaves per shoot, and n = total number of leaves per shoot). 


plant, r 


treatment, 


b у r t M n 

2 1 4 3 0 10 
2 1 4 4 1 9 
2 1 4 5 1 10 
2 1 4 6 0 16 
2 1 5 1 10 14 
2 1 5 2 1 9 
2 1 5 3 0 11 
2 1 5 4 0 11 
2 1 5 5 0 11 
2 1 5 6 0 11 
2 2; 1 1 1 9 
2 2; 1 2 0 9 
2 2 1 3 0 12 
2 2 1 4 1 10 
2 2 1 5 1 12 
2 2 1 6 0 17 
2 2 2 1 9 12 
2 2 2 2 0 12 
2 2 2 3 0 11 
2 2 2 4 2 14 
2 2 2 5 0 11 
2 2 2 6 0 10 
2 2: 3 1 7 13 
2 2 3 2 0 16 
2 2 3 3 1 12 
2 2 3 4 0 10 
2 2 3 5 0 10 
2 2 3 6 0 11 
2 2 4 1 7. 13 
2 2 4 2 1 18 
2 2 4 3 0 10 
2 2 4 4 0 11 
2 2 4 5 0 11 
2 2 4 6 3 13 
2 2 5 1 5 10 
2 2 5 2 0 10 
2 2 5 3 0 10 
2 2 5 4 0 10 
2 2 5 5 0 9 
2 2 5 6 1 12 
2 3 1 1 6 13 
2 3 1 2 0 10 
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treatment, 


shoot, t 


plant, г 
number of diseased leaves per shoot, and п = total number of leaves per shoot). 


Data: Disease incidence in grapevine plants (b = block, v 


m 
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11 
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Data: Disease incidence in grapevine plants (b = block, v 
m = number of diseased leaves per shoot, and n = total number of leaves per shoot). 


plant, г 


treatment, 


b у r t M n 

3 1 3 3 1 13 
3 1 3 4 0 18 
3 1 3 5 0 14 
3 1 3 6 0 14 
3 1 4 1 10 14 
3 1 4 2 2 17 
3 1 4 3 0 10 
3 1 4 4 1 19 
3 1 4 5 0 17 
3 1 4 6 0 16 
3 1 5 1 9 10 
3 1 5 2 1 14 
3 1 5 3 1 11 
3 1 5 4 0 18 
3 1 5 5 0 15 
3 1 5 6 1 11 
3 2 1 1 10 10 
3 2 1 2 1 11 
3 2 1 3 0 12 
3 2 1 4 1 15 
3 2 1 5 4 20 
3 2 1 6 0 14 
3 2: 2. 1 9 12 
3 2 2 2 1 10 
3 2 2 3 1 12 
3 2 2 4 3 18 
3 2 2 5 0 16 
3 2 2 6 0 12 
3 2 3 1 10 11 
3 2 3 2 1 16 
3 2 3 3 1 14 
3 2 3 4 1 17 
3 2 3 5 2 15 
3 2 3 6 1 16 
3 2 4 1 9 11 
3 2 4 2 2 14 
3 2 4 3 0 10 
3 2 4 4 0 18 
3 2 4 5 0 17 
3 2 4 6 0 12 
3 2 5 1 11 12 
3 2 5 2 2 12 
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treatment, 


shoot, t 


plant, r 


Data: Disease incidence in grapevine plants (b — block, v 


number of diseased leaves per shoot, and n — total number of leaves per shoot). 
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Chapter 7 
Time of Occurrence of an Event of Interest s 


7.1 Introduction 


In studies such as biological sciences, animal science, and agronomy, a common 
outcome of interest is the time at which an event of interest occurs. The main 
characteristic of these data is that the subjects/experimental units are usually 
observed for different periods of time until the event of interest occurs. These events 
of interest may be adverse events such as the death of an experimental unit and the 
cessation of lactation, or positive events such as the conception of a female’s 
offspring from a particular treatment and the onset of estrus in a female undergoing 
hormone treatment, among others. Because of the characteristics of these response 
variables, a “normal” distribution is often a poor choice for modeling the time at 
which the event of interest occurs. Exponential, log-normal, gamma, Weibull, and 
other more complex distributions that tend to be more common and are better 
choices for modeling these phenomena. 

Fitting a generalized linear mixed model (GLMM) is a good option for analyzing 
these phenomena because the conditional response distribution of the random effects 
of this model has desirable properties. In this vein, it is conventional to speak of 
survival data and survival analysis, regardless of the nature of the event. Similar data 
also arise when measuring the time to complete a task, such as walking 50 meters, 
passing an agronomy exam, performing a sensory evaluation of coffee, and so 
on. The purpose of this chapter is to provide the reader with the essential language 
of linear models and the connection between GLMMs and survival analysis. 
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7.2 Generalized Linear Mixed Models with а Gamma 
Response 


The gamma family of distributions encompasses continuous, nonnegative, right-skewed 
values. A gamma distribution has two nonnegative parameters — and В —the probability 
density function of which is given by: 


yt lel 39.520 


fia.) = Tuy 


where Г(а) = fọ t^ е ‘dt is the gamma function (Casella and Berger 2002). The 


mean and variance of a random gamma variable are E[Y] = af = и and 
Var[Y] = ар? =1/, respectively. This density function can be rewritten in terms 
of the mean и and the scale parameter ф = Ша. 


1 
5) (ud) 5 el 0) y>0. 


աա: 


7.2.1 CRD: Estrus Induction in Pelibuey Ewes 


Estrus induction in ewes is a very common practice carried out in livestock farms or 
at research centers. For this, an animal researcher uses gonadotropin-releasing 
hormone (GnRH), equine chorionic gonadotropin (eCG), and P4 in a controlled 
internal drug-releasing (CIDR) intravaginal device in female Pelibuey ewes (n = 78) 
with single, double, and triple lambing as treatments. In order to ensure that all 
animals were in good condition during the experiment, ewes received the same 
zootechnical management and feeding. For this experiment, the ewes were synchro- 
nized on the same day under a synchronization protocol. Table 7.1 presents the 
analysis of variance (ANOVA). 

The variables evaluated in this experiment were the time of onset and duration of 
estrus (yj) in hours according to the type of calving. The variability among 
female sheep on weight, age, and body condition must be taken into account in the 


Table 7.1 Sources of Sources of variation Degrees of freedom 
variation and degrees 
Treatment t= 1 3 = 1 =2 
of freedom u | 
Tror 
> қ-і--175 
і- 
Total 3 


Yrn-1-78-1-77 


¿i= 
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Table 7.2 Results of the analysis of variance 


(a) Covariance parameter estimates 


Start of estrus Duration of estrus 
Cov Parm Estimate Standard error Estimate Standard error 
Parto (animal) (Ótintuype(animaly ) —0.01572 | 0.08370 0.09692 
Residual ($) 0.06668 0.01232 0.2073 0.08938 


(b) Туре Ш tests of fixed effects 


Inicio estro Duración estro 
Effect Num DF Den DF F-value Pr> F F-value Pr > F 
Birth type 2 75 5.12 0.0082 22.61 <0.0001 


analysis. The data from this experiment can be found in the Appendix 1 of this book 
(Data: Pelibuey Sheep). Thus, the components of а гатта GLMM аге as follows: 


Distributions: y; | Անի ~ Gamma(u;;, $); i= 1,2;3:]— 1,7 
r(t); տ мо, о? шины 
Linear predictor: n; = и + т; + z(r);; 


Link function: log (и) =Nij 


where y; is the ith link function for treatment i (type of birth angle, double or triple) 
in ewes /, и is the overall mean, т; is the fixed effect due to type of birth (treatment), 
rG); is the random effect due to type of birth (treatment) in ewes j with 


Ան) ~ мо, о inar) Ы 
Тһе following GLIMMIX program fits the model 


proc glimmix nobound method=laplace; 
class animal birthtype; 

model Inestro = birthtype/dist=gamma ; 
random birthtype (animal) ; 

lsmeans birthtype/lines ilink; 

run; 


Part of the results is reported in Table 7.2. 
Subsection (a) shows the estimated variance components due to the type of 


parturition used in females СИ ) = — 0.0157( +0.0837)) as well as the 


scale parameter ($ = 0.06668 ) ; 


Table 7.2 (Б) shows the results of the hypothesis tests for type III fixed effects, 
which indicate that there is a statistically significant effect of treatment (type of birth) 
on the time of onset and duration of ewe estrus. 


animal 
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Table 7.3 Means and standard errors on the model scale (“Estimate” column) and the data scale 
(“Меап” column) for the onset and duration of estrus іп Pelibuey ewe lambs 


Parto least squares means 


Standard Standard error 
Birth_type | Estimate | error DF |т-уаше |Pr> Ш | Mean mean 
Start of estrus 
1 3.2913 0.06631 75 | 49.63 <0.0001 | 26.8787 | 1.7824 
2 3.0622 0.04606 75 | 66.48 <0.0001 | 21.3735 | 0.9845 
3 3.0496 0.04542 75 | 67.14 <0.0001 | 21.1059 | 0.9586 
Duration of estrus 
1 1.6826 0.1518 75 11.09 <0.0001 | 5.3795 | 0.8164 
2 2.6716 0.1171 75 | 22.81 <0.0001 | 14.4637 | 1.6938 
3 2.8075 0.09846 75 | 28.51 <0.0001 | 16.5684 | 1.6313 


The last two columns of Table 7.3, labeled “Mean” and “Standard error,” 
correspond to the means (м) on the data scale for the ewes’ mean onset and duration 
of estrus with their respective standard errors. For example, the mean time to onset of 
estrus in single-birth ewes was 26.87 + 1.78 hours, whereas for double- and triple- 
birth ewes, it was 21.37 + 0.98 and 21.1 + 0.95, respectively. On the other hand, the 
average time (in hours) of estrus duration was longer in double- and triple-birth ewes 
(14.46 + 1.69 and 16.56 + 1.63, respectively) compared to single-birth ewes 
(5.38 + 0.81). 


7.2.2 Randomized Complete Block Design (RCBD): Itch 
Relief Drugs 


A total of 10 male volunteer patients between 20 and 30 years of age participated as a 
study group to compare 7 treatments (Trts) (5 drugs, 1 placebo, and 1 no drug) to 
relieve their itching. Since each subject responded differently to each drug, and, in 
addition, each subject received a different treatment in the 7 days of study, each of 
the subjects can be considered a block. Treatment assignment was randomized 
across days. Except for the drug-free day, subjects were administered the treatment 
intravenously, and, then, their forearms were induced to itch using an effective itch 
stimulus called cowage. The duration of itching, in seconds, was recorded. The data 
are shown in Table 7.4. 

From left to right, the drugs used were papaverine = Papv, morphine = Morp, 
aminophylline = Amino, pentobarbital = Pent, and tripelennamine pentobarbital = Tripel. 

The analysis of variance table (Table 7.5) shows the sources of variation and 
degrees of freedom for this experiment. 
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Table 7.4 Time taken to get rid of the itch 


Patient No drug Placebo Papv Amino Pent Tripel 
1 174 263 105 141 108 141 
2 224 213 103 168 341 184 
3 260 231 145 78 159 125 
4 255 291 103 164 135 227 
5 165 168 144 127 239 194 
6 237 121 94 114 136 155 
7 191 137 35 96 140 121 
8 100 102 133 222 134 129 
9 115 89 83 165 185 79 
10 189 433 237 168 188 317 


Table 7.5 Sources of 


variation and degrees of 


freedom 


Sources of variation 


Degrees of freedom 


Blocks r—1=10-1=9 
Treatment t—127—126 
Error (t— D(r—1)26x9-—54 
Total rxt—1—10x7—1-69 


The components of the GLMM with a gamma response are as follows: 


Distributions : y; | r(af) 


ijk 
r;~ м(о c? 
J 9; 


patient 


~ Gamma(u;;, $); i = 199% Т е іе 7,10; 


Linear predictor: у = + r; + т; 


Link function: log (ик) = Nijk 


where rjj is the predictor with treatment i and block j, и is the overall mean, ғ; is the 


random effect of the patient with r; ~ N (o. б 


treatment. 


2 
patient 


): and 7; is the fixed effect due to 


Note, although the exponential and gamma distributions have a canonical link 
equal to the inverse of the mean, the gamma and exponential GLMMs most often use 
a computationally more stable link (link = log), which was used in this and in the 
previous analysis. 

The following GLIMMIX syntax adjusts a GLMM into complete blocks. 


proc glimmix nobound method=laplace; 


class Patient Trt; 


model y= Trt/dist=gamma; 
random Patient; 
lsmeans Trt/lines ilink; 


run; 
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Table 7.6 Results of the 


| à (a) Fit statistics for conditional distribution 
analysis of variance 


—2 Log L (y I r. effects) 728.62 
Pearson’s chi-square 5.69 
Pearson’s chi-square/DF 0.08 

(b) Covariance parameter estimates 

Cov Parm Estimate Standard error 
Patient 0.03964 0.02375 
Residual (3) 0.09132 0.01640 


(с) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr > Е 
Trt 6 54 3.82 0.0030 


The statistics of the conditional model (Pearson s chi — squre/DF = 0.08) as well 
as the variance components (Patient) and the scale parameter ($) of the model 


indicate that the gamma model adequately describes the dataset (Table 7.6 parts 
(a) and (b)). The analysis of variance (Table 7.6 part (c)) indicates that there is 
a highly significant difference of treatments іп the mean time of itch duration 
(P = 0.0030). 

The dispersion observed in the following plot (top left) of the residuals versus the 
linear predictor value suggests that the variance is constant and homogeneous 
(Fig. 7.1). The histogram (upper right) shows a nearly symmetrical pattern with 
little bias. Furthermore, the residuals versus quantile plot (bottom left) shows no 
marked deviations, indicating that the fit is adequate. Finally, the bottom right plot 
shows that the average residuals are zero and vary between —0.5 and 0.75. 

The “Ismeans” on the data scale, for each of the five treatments, placebo, and the 
control treatment, are shown under the “Mean” column with their respective “Stan- 
dard error" in Table 7.7. Each of the five drugs appear to have a significant effect 
compared to the placebo and control. Papaverine (Papv) is the most effective drug. 
Both the placebo and control treatment have statistically similar means. The rela- 
tively large difference in the placebo group suggests that some patients responded 
negatively to the placebo compared to the control, whereas others responded 
positively. 

Figure 7.2 shows that the drug papaverine significantly reduced the itching time, 
followed by the drugs aminophylline and morphine, whereas the efficacies of the 
drugs pentobarbital and tripelennamine were highly similar to each other in elimi- 
nating itching. 


7.2.3 Factorial Design: Insect Survival Time 


This experiment consisted of studying the effectiveness of four different types of 
insecticides (Insecl,Insec2,Insec3,and Insec4) at three different concentration 
levels (low, medium, and high) in the survival time (in hours) of a particular species 
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Fig. 7.1 Conditional residuals 


Table 7.7 Means and standard errors on the model scale (“Estimate” column) and the data scale 
(“Mean” column) for the average duration time of the itch 


Trt least squares means 


Standard Standard error 
Trt Estimate | error DF | ғуаше Pr > Id Mean of mean 
Amino 4.9795 0.1149 43.32 <0.0001 |145.41 |16.7129 
Могр 4.9797 0.1146 43.44 «0.0001 | 145.43 | 16.6733 
Рару 4.7356 0.1149 41.20 «0.0001 | 113.93 | 13.0956 
Pento 5.1703 0.1149 44.99 «0.0001 | 175.97 | 20.2211 
Placebo | 5.2704 0.1151 45.79 <0.0001 | 194.49 | 22.3867 
No drug | 5.2542 0.1148 45.76 «0.0001 | 191.36 | 21.9723 
Tripel 5.0802 0.1147 44.28 <0.0001 | 160.80 | 18.4487 


of beetles (Appendix 1: Data: Beetles). The interaction between both factors (insec- 
ticide * dose) yielded a total of 12 combinations (treatments). The objective of this 
study was to compare the insecticides, dose, and interaction with beetle survival 
time. Due to the intrinsic characteristics of each of the insects, these must be 
considered as a source of variation in the experiment, since they respond differently 
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Table ds — a 2. Sources of variation Degrees of freedom 
tion and degrees of freedom 
Blocks Ւ-1Հ4-1ՀՅ 
Insecticide a-1=4-1=3 
Dose b-1=3-1=2 
Insecticide * dosage (a — 1(b —1 =3х2=6 
Error ab(r — 1) =4х3х3 = 33 
Total ғхахр-1-4х4х3-1-47 


to certain stimuli. Assuming that 48 beetles аге available, they were randomly 
assigned equally to 4 groups (blocks) with 12 treatment combinations. That is, 
four beetles were randomly assigned to each treatment. 

The sources of variation and degrees of freedom for this experiment are shown in 
the following analysis of variance table (Table 7.8). 

The components of the gamma-response GLMM are as follows: 


Distributions : у, | г, ~ Сатта(и, Փ):1Հ 1,0,4; j= L,2,3;k— 1, 775,4. 
Tk ~ N(0, ны 
Linear predictor: ij, = и + ry + a; + f; + (ар), 


Link function: log (uk) = 


The following GLIMMIX command adjusts а GLMM with а gamma response. 
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Table 7.9 Results of the (a) Fit statistics for conditional distribution 

analysis-of vatiance —2 Log L (tiempo | r. effects) 121.05 
Pearson’s chi-square 1.91 
Pearson’s chi-square/DF 0.04 
(b) Covariance parameter estimates 
Cov Parm Estimate Standard error 
block —0.00173 š 
Residual 0.04155 0.008818 
(c) Type III tests of fixed effects 
Effect Num DF |Den DF | F-value | Pr > F 
Dose 2 33 69.61 <0.0001 
Insecticide 3 33 31.36 <0.0001 
Dose*insecticide |6 33 2.05 0.0868 


proc glimmix nobound method=laplace; 
class dose insecticide insect; 

mode1 time = dose | insecticide/dist=gamma; 
random insect; 

lsmeans dose | insecticide/lines ilink; 
run; 


Part of the Statistical Analysis Software (SAS) output is shown in Table 7.9. The 
value of the conditional model's Pearson s chi — square/DF — 0.04 indicates that the 
gamma distribution adequately models the data. The estimated variance component 
for blocks and the scaling parameter given by the "residual" value are shown below 
(in part (b)) (62,4, = — 0.00173, and 6? = 0.04155, respectively). 

The analysis of variance in (c) of Table 7.9 indicates that the insecticides and dose 
(P — 0.0001) have different significant effectiveness (toxicity) on beetle survival 
time. However, the interaction between both factors is close to significance 
(P = 0.0868). The “Ismeans” values on the data scale for dose 4; (part (a)) and 
insecticide й Մ (part (Ե)) with their respective standard errors for both factors are 
listed under the columns titled “Mean” and “Standard error mean” of Table 7.10, 
respectively. 

The combination of levels of both factors affected the average survival time of the 
beetles (Table 7.11). For insecticides 1 and 3 at a high dose, the survival time was 
lower with average times of 2.1 + 0.209 and 2.35 + 0.334 hours, respectively. In 
general, low values of survival times were observed for insecticides 1 and 3 com- 
pared to insecticides 2 and 4. 


Z2.4 А Split Plot with а Factorial Structure оп a Large Plot 
in a Completely Randomized Design (CRD) 


Four samples were obtained from each of two batches (Reps) of unprocessed gum 
from Acacia sp. Trees, with eight samples in total. Within each batch, the four 
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Table 7.10 Means and standard errors on the model scale (“Estimate”) and the data scale (“Mean”) 
for the factor dose and type of insecticide 


(a) Dose least squares means 


Standard Standard error 
Dose Estimate | error DF | уаше | Pr> Id Mean | mean 
High 0.9960 0.04984 33 | 19.98 «0.0001 | 2.7075 | 0.1349 
Low 1.7840 0.04984 33 | 35.79 <0.0001 | 5.9538 | 0.2967 
Medium | 1.6203 0.04984 33 | 32.51 <0.0001 | 5.0548 | 0.2519 
(b) Insecticide least squares means 

Standard Standard error 

Insecticide | Estimate | error DF |t-value | Pr > Id Mean | mean 
Insecl 1.1074 0.05755 33 |19.24 «0.0001 |3.0265 |0.1742 
Insec2 1.8272 0.05755 33 |3175 <0.0001 |6.2166 |0.3578 
Insec3 1.3041 0.05755 33 | 22.66 «0.0001 | 3.6845 | 0.2121 
Insec4 1.6284 0.05755 33 | 28.29 <0.0001 | 5.0960 | 0.2933 


Table 7.11 Means and standard errors on the model scale and the data scale for the interaction 
between dose and type of insecticide 


Dose* insecticide least squares means 


Standard Standard 
Dose Insecticide | Estimate | error t-value | Pr > Id Mean | error mean 
High Insecl 0.7419 0.09968 7.44 | «0.0001 |2.1000 | 0.2093 
High Insec2 1.2089 0.09968 12.13 «0.0001 | 3.3499 | 0.3339 
High Insec3 0.8545 0.09969 8.57 <0.0001 | 2.3501 | 0.2343 
High Insec4 1.1788 0.09969 11.82 | «0.0001 | 3.2503 | 0.3240 
Low Insecl 1.4171 0.09968 14.22 | <0.0001 | 4.1250 | 0.4112 
Low Insec2 2.1747 0.09968 21.82 |-<0.0001 | 8.7998 | 0.8772 
Low Insec3 1.7361 0.09969 17.42 | «0.0001 | 5.6754 | 0.5658 
Low Insec4 1.8082 0.09968 18.14 | «0.0001 | 6.0994 | 0.6080 
Medium | Insecl 1.1632 0.09969 11.67 <0.0001 | 3.2000 | 0.3190 
Medium | Insec2 2.0980 0.09968 21.05 «0.0001 | 8.1499 | 0.8124 
Medium | Insec3 1.3218 0.09969 13.26 | <0.0001 | 3.7501 | 0.3738 
Medium | Insec4 1.8984 0.09969 1904 | «0.0001 | 6.6753 | 0.6654 


samples were randomly assigned to combinations of two factors with two levels 
each. The first factor refers to whether the gum was demineralized or not, and the 
second factor refers to whether the gum was pasteurized or not. An emulsion made 
from each gum sample was divided into three smaller parts, which were randomly 
assigned to the levels of a third factor, the PH, and pH was adjusted to 2.5, 4.5, or 5.5 
using citric acid (Appendix |: Data: Gum Breakdown Times). 

This is a split-plot design, with whole plots and rubber samples in a block 
arrangement. The combined levels of demineralization and pasteurization of the 
paste are large (whole) plot factors. The split plots are the smaller parts, with a 
specific pH, which is the only split-plot factor. The response measured (у) was the 


72 Generalized Linear Mixed Models with a Gamma Response 289 


Table 7.12 Sources of variation and degrees of freedom 


Sources of variation 


Demineralization (Des) 


Degrees of freedom 
а-1=2-1=1 


Pasteurization (Pasteu) 


b-1=2-1=1 


Demineralization*pasteurization 


(а= 0 — 1) = 1 


Des*Pasteu (rep) 


ab(r = D =2x2x1=4 


pH (c-1)=3-1=2 
Demineralization*pH (a — 1D(c— 1) = 2 

Pasteurization*pH (b — IXc- 1) = 2 

Des*Pasteu*pH (a — Db — D(c— 1) = 2 

Error ab(c — G - 1) =2х2х2х1= 8 
Total ғхахьхс-і-?2х2х2х3-1-?23 


time to break, 1.е., Ше time (іп hours) until Ше emulsion failed. Тһе sources of 
variation and degrees of freedom for this experiment are shown in Table 7.12. 
The components of the GLMM with a Gamma response are as follows: 


Distributions: y; | ri a(r); ~ Gamma (иж, ); i= 1,2; j — 1,2; k—21,2,3;1— 1,2. 


rı ~ М (0, 02), ap(r) ~ мо, об) 


Linear predictor: q; = и + ai + B; + (ap); + г(а8) + ук + (оу), + (бу), 
RE (аВу) к; 


where а, В,» and y, are the fixed effects due to the factors demineralization, 
pasteurization, and pH, respectively; the effects (аф); (оу) (PY) jx, and (APY) ix 
are the two- and three-way interactions of the factors under study; and ар(г);и are 
random effects due to the demineralization x pasteurization x rep interaction, 


assuming that af (r); ~ n(o, об). 


Link function: log (и) = "Тж 


The GLIMMIX commands for setting this GLMM are as follows: 


proc glimmix nobound method=laplace; 

class Batch Demineralization Pasteurization pH; 

model y = Demineralization |Pasteurization | pH/dist=gamma; 
random batch (Demineralization*Pasteurization) ; 

lsmeans Demineralization|Pasteurization|pH/lines ilink; 
run; 
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Table 7.13 Results of the analysis of variance 


(a) Fit statistics for conditional distribution 


—2 Log L (y I r. effects) 192.24 
Pearson’s chi-square 0.12 
Pearson’s chi-square/DF 0.01 

(b) Covariance parameter estimates 

Cov Parm Estimate Standard error 
Rep (Desmin*Pasteur) 0.001428 0.001864 
Residual (3) 0.006011 0.002126 

(c) Type III tests of fixed effects 

Effect Num DF Den DF F-value Pr> F 
Demineralization (Des) 1 4 35.48 0.0040 
Pasteurization (Pasteu) 1 4 19.49 0.0116 
Demineralization*pasteurization 1 4 35.67 0.0039 
pH 2 8 5.27 0.0346 
Demineralization*pH 2 8 3.84 0.0676 
Pasteurization* pH 2 8 0.57 0.5889 
Des*Pasteu* pH 2 8 4.32 0.0535 


The relevant results from the SAS output are shown in Table 7.13. The value of 
the conditional model £ = 0.01 indicates that the gamma distribution does not 
cause overdispersion. The variance component due to blocks x demineralization x 
pasteurization @ and the scale parameter $ are shown in (b). 

Тһе hypothesis tests for type Ш fixed effects are presented in part (c) of 
Table 7.13, where a significant effect of the factors demineralization, pasteurization, 
and pH as well as the interaction between demineralization with pasteurization are 
observed on the gum. However, the interactions demineralization*pH (P = 0.0676) 
and demineralization*pasteurization*pH are close to significance (Р = 0.0535). The 
emulsion breaking time is strongly affected by no demineralization (demineraliza- 
tion = 1) and no pasteurization (pasteurization = 1) of the gum and, to a lesser 
extent, by the pH adjusted to the gum (Table 7.14). 

Analyzing the simple effects of the factors, we can observe that when the gum has 
not been pasteurized (B = 1), the average emulsion break time is very similar in the 
demineralized paste than in the non-demineralized paste at the three pH levels. 
However, when the gum has been pasteurized, demineralization has a significant 
impact on the emulsion breakup time; for example, for a paste that is not 
demineralized апа pasteurized (А1В2), the emulsion breakup time is much lower 
than when the gum has been demineralized and pasteurized (A2B2) at all three pH 
levels. Finally, with a demineralized, pasteurized gum at pH = 4.5, a gum with 
higher breaking stability is obtained (Table 7.15). 
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Table 7.14 Means and standard errors of the main effects on the model scale (Estimate) and the 
data scale (Mean) 


(a) Demineralization least squares means 


Standard Standard error 
Demineralization | Estimate | error DF value | Pr > Id Mean | mean 
1 5.0911 0.02930 4 17377 | <0.0001 | 162.57 | 4.7628 
2 5.3379 0.02930 4 182.18 | <0.0001 | 208.07 | 6.0964 
(b) Pasteurization least squares means 

Standard Standard error 

Pasteurization | Estimate | error DF |t-value | Pr > Id Mean | mean 
1 5.1230 0.02930 4 174.87 | <0.0001 | 167.84 | 4.9171 
2, 5.3059 0.02930 4 181.08 | «0.0001 | 201.53 | 5.9051 
(c) pH least squares means 
pH | Estimate | Standard error | DF | t-value Pr > И Mean Standard error mean 


1 5.1610 0.03050 
2 5.2839 0.03050 


169.22 «0.0001 | 174.33 | 5.3171 
«0.0001 | 197.13 | 6.0124 
170.32 <0.0001 |181.02 |5.5255 


оо | оо | оо 
= 
Հ 
eo 
io 
£ 


3 5.1986 0.03052 


Table 7.15 Means and standard errors of the simple effects on the model scale (Estimate) and the 
data scale (Mean) 


Demineralization* pasteurization*pH least squares means 


Standard Standard error 

A |B |С | Estimate | error DF | ғуаше | Pr > Id Mean | mean 

1 |1 1 | 5.0696 0.06099 8 83.13 «0.0001 | 159.11 | 9.7035 

1 |1 |2 | 5.1695 0.06105 8 84.68 <0.0001 | 175.83 | 10.7339 

1 1 3 |5.1311 0.06100 8 84.12 <0.0001 | 169.20 | 10.3204 

1 |2 |1 15.1137 0.06099 8 83.84 <0.0001 | 166.28 | 10.1419 

1 |2 |2 | 50445 0.06098 8 82.72 «0.0001 | 155.17 | 9.4623 

1 2 |3 | 5.0183 0.06098 8 82.29 «0.0001 | 151.15 | 9.2170 

2 | |1 | 5.0811 0.06103 8 83.26 <0.0001 | 160.95 | 9.8225 

2 |1 |2 | 5.1694 0.06099 8 84.76 «0.0001 | 175.81 | 10.7225 

2 |1 |3 15.1175 0.06110 8 83.76 <0.0001 | 166.91 | 10.1978 

2 |2 |1 | 5.3796 0.06100 8 88.19 <0.0001 | 216.93 | 13.2320 

2 |2 |2 | 5.7520 0.06100 8 94.30 «0.0001 | 314.81 | 19.2031 

2 |2 |3 |5.5277 0.06106 8 90.53 «0.0001 | 251.57 | 15.3607 
А = demineralization (1 = no, 2 = yes), В = pasteurization (1 = по, 2 = yes), and C = pH (1 = 2.5, 
2 = 4.5, and 3 = 5.5) 


7.3 Survival Analysis 


When a research focuses on the time of occurrence of a specific event, we usually 
refer to survival times, and, hence, the statistical analysis of these times, as men- 
tioned above, is known as survival analysis. A very characteristic feature of survival 
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times 15 the presence of censored times, that is, when there are individuals whose 
actual survival time is not known. 

For a set of survival times (including censored ones) of a sample of individuals, it 
is possible to estimate the proportion of the population that will survive a time 
interval under the same circumstances. The methods used to make this estimate are 
based on the proposal of Kaplan and Meier (1958). This method allows — through 
different statistical tests (log rank, Breslow, Tarone—Ware, etc.) — the comparison of 
the survival of two or more groups of individuals who differ with respect to certain 
factors. 

Survival analysis focuses its interest on a group or several groups of individuals 
for whom an event is defined, which occurs after a time interval. To determine the 
time of interest, there are three requirements: an initial time, a scale to measure the 
passage of time (minutes, hours, days, etc.), and clarity about what is meant by the 
event of interest. 

Survival of an individual is conceptually the probability of being alive in a given 
time / from diagnosis, i.e., initiation of treatment or complete remission for a group 
of individuals. In clinical studies, survival times often refer to time till death, 
development of a particular symptom, or relapse after complete remission of a 
disease. Failure is defined as death, relapse, or the occurrence of a new disease. In 
many survival analyses, when the end of the observation period previously set by the 
investigator is reached, there are individuals to whom the event has not occurred and 
we do not know when it will occur. Therefore, the actual survival time for them is 
unknown, and only the survival time to the end of the study is known. Such survival 
times are called censored times. It also happens, in some cases, that some individuals 
do not continue the study until the end of the analysis period for reasons unrelated to 
the research, e.g., death from other causes; these times are also censored. These 
censored data contribute valuable information and, therefore, should not be omitted 
from the analysis. 

The pharmaceutical and food industries are legally required to label the shelf life 
of their product on the packaging. For pharmaceuticals, the requirements for how to 
determine shelf life are highly regulated. However, the regulatory standards do not 
specifically define shelf life. Instead, the definition is implicit through the estimation 
procedure. The interest is in the situation where multiple batches are used to 
determine a shelf life of a product that applies to all future batches. Consequently, 
both shelf life and label life are of great importance because of the variability within 
and between batches. Product development must be very well thought out before a 
company can have confidence in shelf life estimates. The company must be able to 
reliably produce a homogeneous product from batch to batch of ingredients, as 
physical and chemical factors impact the ability of bacteria to grow, such as pH, 
water activity, and uniformity of the mix (moisture distribution, salt, preservative or 
food acid) and, consequently, the shelf life of the product. Therefore, products 
should be inspected at appropriate times and samples should be tested for critical 
stability of physical and chemical characteristics. These tests also provide an oppor- 
tunity to begin microbiological testing for spoilage organisms. Testing should 
continue beyond the intended shelf life unless the product fails earlier. Testing 
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should lead to an understanding of target levels and ranges of ingredients for 
evaluation of the critical physical and chemical characteristics of the product over 
the intended shelf life. 

Survival analysis is the name for a collection of statistical techniques used to 
describe and quantify the time in which the event of interest occurs. The term 
“survival time” specifies the amount of time taken to occur. Situations in which 
survival analyses have been used in epidemiology include: 


(a) Survival of insects after having received an insecticide. 
(b) The time taken by cows or ewes to conceive after calving. 
(c) The time taken for a farm to experience its first case of an exotic disease. 


7.3.1 Concepts and Definitions 


To clearly understand and interpret a rate of change calculated from the event data of 
interest, a more extensive approach is needed. The definition of a rate of change 
begins with the mathematical description of a changing pattern over time, 
represented by the symbol S(f). A version of a ratio is created by dividing the change 
in function S(A[S(H) to S(t + ДВ] by the corresponding change over time f(t to t + Af) 
producing the rate of change 


MISES change on 50 _ S(t) — S(t + At) E S(t) — S(t + At) 
change on time (t+ At) ^t At 


Rates of change, with respect to time, apply to a variety of situations, but one 
specific function, traditionally denoted by 5(4), is fundamental to the analysis of 
survival data. This is called the survival function and is defined as the probability of 
surviving (probability of survival) beyond a specific point in time (denoted by 2). 
That is; 


kaj 


S(t) = P(survival time = 0 at time = t) 


= P(survival in the interval |0,4) 
Equivalent to 
S(t) = P(surviving beyond time t) 2 P(T >t) = 1 — F(t) 
where F(ft) is the cumulative distribution function with F() = P(T < t). Another 


important concept in survival analysis is the hazard function A(t). The hazard 
function that depends on T is defined as 
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A(t) = lim 


P(t€T <t+At|T>12) 
At—0 


At 


such that the following expression can be expressed as 


kt) edm et At) = ERE. 


41-0 At P(T >t) 


КЕБУ) 


where f(t) is the probability density function. Any distribution defined by 7 € [0, £ 
can serve as a survival distribution. Consequently, 


д 
д) = — < (10850). 
It then follows that 


S(t) = exp{ — H(1)} 


where H(t) the cumulative hazard function 


Another useful relationship is 
H(t) = — log S(t). 


For the simplest model, the exponential model with հ(8 = 4 (À is a constant), the 
survival function is given by 
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with the probability density function given by 


(= 250 =4e 1. 


Thus, the survival function, hazard function, and cumulative risk for the 
exponential model is given by: 


73 Survival Analysis 295 


Survival function: S(t) = e ^ 
; 5 NM Ta 
Risk function: h(t) S(t) ея 4 


1 
Cumulative risk function: H(t) = ү h(u)du = ] Adu = at. 


t 0 


- 


7.3.2 CRD: Aedes aegypti 


The objective of this experiment was to test the vulnerability of Aedes aegypti 
mosquitoes to different fungal treatments (four treatments). A bioassay was 
conducted to determine the survival time of each of the mosquitoes. Three-day-old 
mosquitoes were maintained after hatching in 45-cm rearing cages with access to 
water but not food. The mosquitoes were kept in rearing cages with water and fed 
warm pig blood (37 °C) through a natural membrane (sausage casing) approximately 
every 3 days and allowed to oviposit freely during the waiting period. A total of 
10 mosquitoes were placed in a chamber to which one of the treatments (four) plus a 
control was applied. Here, we present part of the data from a bioassay with four 
replicates. The complete data from this trial can be found in the Appendix 1 (Data: 
Aedes aegypti). 


Treatment Rep Y 
С 1 8 
С 1 11 
C 4 20 
Mam 1 2 
Mam 1 2 
MaS 1 3 
MaS 1 3 
MaC 1 2 
MaC 1 2 
MaC 1 2 
Mal 1 2 
Mal 1 2 
Mal 4 11 
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Table 7.16 Results of the 


| à (a) Fit statistics for conditional distribution 
analysis of variance 


—2 Log L (T | r. effects) 716.70 
Pearson's chi-square 35.33 
Pearson's chi-square/DF 0.18 


(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr > F 
Trt 4 192 186.42 <0.0001 


Тһе components of this GLMM аге ав follows: 


Distributions: y; | rep; ~ Gamma (иу, $) 
гер; = N (0. օր) 
Linear predictor: 1; =] + т; + rep; 


Link function: n; = log(u;) 


where 7 is the intercept, z; is the treatment effect, and rep; is the random effect due to 
the mosquito chamber assuming rep; ~ N (0. օր) ; 


The following GLIMMIX commands adjust a GLMM with a gamma response: 


proc glimmix data=mosquitos method=laplace; 
class bio trt rep; 

model y = trt/dist=gamma; 

random rep; 

lsmeans trt/lines ilink; 

run; 


Part of the output is shown in Table 7.16. The statistic in (a) above indicates that 
there is no over-dispersion in the fit of the data, as indicated by Pearson s chi — 
square/DF = 0.18. The analysis of variance (type Ш tests of fixed effects) indicates 
that there is a highly significant effect (P = 0.0001) of the fungal treatments on the 
mean mosquito survival time. 

The relevant information in Table 7.17 “Ismeans” comes from the columns 
labeled “Estimate” and “Mean”: these are the estimates on the model scale and the 
data scale, and the average survival time in each of the treatments is represented by 
Ք, (+ standard error). 

The estimated risk function for each treatment combination is А; Հ М. For ехат- 
ple, for treatment Mal, Ше estimated hazard function is ЗИН = 14 4223 = 0.2922. We 


сап manually calculate these values from the Mean column ог we сап automate Ше 
process by adding the command “ods output Ismeans = mu" in the GLIMMIX 
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Table 7.17 Means and standard errors of the main effects on the model scale (Estimate) and the 
data scale (Mean) 


Trt least squares means 


Standard t- Standard error 
Trt Estimate | error DF value |Pr» fl Mean mean 
Mal 1.2303 0.06354 192 1936 |«0.0001 | 3.4223 |0.2174 
MaC 0.9562 0.06350 192 |15.06 | «0.0001 | 2.6017 | 0.1652 
MaS 1.5798 0.06357 192 | 24.85 | <0.0001 | 4.8542 | 0.3086 
Mam 0.6946 0.06350 192 | 10.94 | <0.0001 | 2.0029 | 0.1272 
Control | 2.7155 0.06362 192 | 42.68 | <0.0001 | 15.1126 | 0.9615 


program above. Once we have saved the treatment means, we can ask SAS to estimate 
the estimated hazard function for the treatments. The commands are as follows: 


data hazard; 

set mu; 

hazard=1/mu; 

proc print data=hazard; 
run; 


The results are listed below in Table 7.18. The hazard column contains the 
estimated hazard functions for each treatment h;(t) = 4;. 


From the values у, we can calculate the estimated survival function S;(1) = e - 77 
for each of the treatments. Figure 7.3 shows the probability of survival over time 
obtained with S;(f) =e! of each of the proposed treatments and the control. 
Clearly, the treatments MaS, Mal, MaC, and Mam showed a greater efficacy in 
the biological control of these mosquitoes. 


7.3.3 RCBD: Aedes aegypti 


Similar to the previous example, this experiment consisted of testing the vulnerabil- 
ity of Aedes aegypti mosquitoes to different fungal treatments (four treatments). For 
this, two bioassays were conducted to determine the survival time of each of the 
mosquitoes. Three-day-old mosquitoes were maintained after hatching in 45-cm 
rearing cages with access to water but not food. Mosquitoes were maintained in 
rearing cages with water and were fed warm pig blood (37 ?C) through a natural 
membrane (sausage casing) approximately every 3 days. They were allowed to 
freely oviposit during the waiting period. A total of 10 mosquitoes were placed in 
a chamber to which one of the treatments (four) plus a control was applied. The data 
can be found in the Appendix 1 (Data: Aedes aegypti). 
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Fig. 7.3 Estimated survival probability for each treatment 


The components of this GLMM are as follows: 


Distributions: у; | bio;, rep(bio),(5 ~ Gamma (ид, Փ) 


bio; — N (0, бо)» rep(bio),; ~ м0, оды ) 


Linear predictor: ij; = + т; + bio; + гер(Бо), դյ 


where 7 is the intercept, т; is the treatment effect, bio; and rep(bio),,; յ) are the random 
effects of the bioassay and the mosquito chamber within the bioassay, respectively, 


assuming bio; ~ N (0, Orin) and rep(bio) kv N (0. 92 ды), 


Link function: } = log (ui) 


The following GLIMMIX program fits a block GLMM with a gamma response. 


proc glimmix method=laplace nobound; 
class bio trt ind rep; 

model у= trt/dist=gamma; 

random bio rep(bio); 

ods output lsmeans-mu; 

lsmeans trt/lines ilink; 

run;quit; 


The results obtained are shown below. Part of the statistics and variance 
components are listed in Table 7.19. In part (a), the value of the statistic of 
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Table 7.19 Results of the 


: à (a) Fit statistics for conditional distribution 
analysis of variance 


—2 log L (Y іт. effects) 3303.50 
Pearson's chi-square 202.30 
Pearson's chi-square/DF 0.34 

(b) Cov Parm Estimate Standard error 
BIO 0.1859 0.1936 
REP(BIO) 0.02562 0.01673 
Residual 0.2822 0.01568 


(c) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr > F 
TRAT 4 588 115.36 <0.0001 


Table 7.20 Means and standard errors of the main effects on the model scale (Estimate) and the 
data scale (Mean) 


TRT least squares means 


Standard t- Standard error 
TRT Estimate | error DF | value |Рг> й Mean mean 
Mal 1.6344 0.3140 588 | 5.21 <0.0001 | 5.1266 | 1.6097 
MaC 1.4903 0.3140 588 | 4.75 <0.0001 | 4.4386 | 1.3939 
Mas 1.8788 0.3140 588 | 5.98 <0.0001 | 6.5455 | 2.0550 
Mam 1.8053 0.3143 588 | 5.74 <0.0001 | 6.0820 | 1.9115 
Control | 2.8293 0.3139 588 | 9.01 <0.0001 | 16.9329 | 5.3153 


Table 7.21 Means and standard errors of the main effects on the model scale (Estimate), the data 
scale (Mean), and the hazard function 2 


Standard t- Standard error Hazard 
TRT Estimate | error DF | value | Probt Mean mean Aj 
Mal 1.6344 0.3140 588 5.21 | <0.0001 | 5.1266 | 1.6097 0.19506 
MaC 1.4903 0.3140 588 |475 | <0.0001 | 4.4386 1.3939 0.22529 
MaS 1.8788 0.3140 588 |5.98 | «0.0001 | 6.5455 2.0550 0.15278 
Mam 1.8053 0.3143 588 |5.74 | «0.0001 | 6.0820 | 1.9115 0.16442 
Control | 2.8293 0.3139 588 |901 | «0.0001 | 16.9329 5.3153 0.05906 


Pearson’s chi — square/DF = 0.34 and in part (b), the estimated variance compo- 
nents due to blocks, within-block replicates, and experimental error are 
бё = 0.1859, 62. (bio) = 0.02562, and ó2 = 0.2822, respectively. The type III effect 
hypothesis tests (part (c)) indicate that there is a highly significant difference 
between treatments on the mean survival time, as indicated by P — 0.0001. 

Tables 7.20 and 7.21 show the estimates on the model scale and the data scale, 
linear predictors (7;), means (ը) with their respective standard errors, and the 
estimated hazard function. The results indicate that the MaC treatment has a greater 
lethal effect than A. aegypti mosquito control. 
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Fig. 7.4 Estimated survival probability for each treatment 


Figure 7.4 shows the survival times for the different treatments tested. These 


ж 


curves were obtained with S;(t) = al jen) | 


7.4 Exercises 


Exercise 7.4.1 Тһе investigation of this experiment focused on studying Ше times 
of animal incapacitation experienced after being exposed to the burning of eight 
types of aircraft interior materials (М1-М9) and performances in milligram/gram 
combustion of seven gases (CO, HCN, H2S, НСІ, НВг, МО», ՏՕշ) (Spurgeon 1978). 
The recorded incapacitation time of the animal when exposed to different combus- 
tion materials (under the column “Material”) is found under the column “Time in 
minutes” and іп the third column the value of (1000/Time); these data are shown 
below (Table 7.22): 


(a) Write down a statistical model of this experiment. 

(b) List all the components of the GLMM in (a). 

(c) Write down the null and alternative hypotheses associated with this experiment. 

(d) Construct an ANOVA table indicating the sources of variation and degrees of 
freedom. 

(e) Analyze the time of inability of the animal to be exposed to the gases of the 
different types of materials. 

(f) Comment on the results obtained. 
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Table 7.22 Time of incapacity of the animal when exposed to different combustion gases 


Material | Time 1000/Time | CO HCN | HS HCl НВг |NO: |50; 
М1 236 |4237 164 6.4 0 0 0 0.26 0 
М1 238 |4202 174 7.5 0 0 5 1.07 0 
М1 2.61 |3831 96 4.7 0 33 5 0.08 0 
MI 3.07 |325.7 101 7.5 0 0 7.1 10.43 0 
MI 3.07 |325.7 142 6.8 0 27.6 0 0.25 0 
MI 319 |313.5 143 8.2 0 0 0.33 0 
MI 3.7 270.3 147 5.2 0 11.3 0.37 0 
MI 3.9 256.4 156 4.7 0 12 0.39 0 
MI 418 |2392 124 3.2 0 23.3 0.2 0 
М1 4.7 212.8 101 8.9 0.9 5.4 0.63 0 
MI 4.86 205.8 142 4.6 0 19.4 0.19 0 
MI 5.58 179.2 104 34 0 80 0.15 0.4 
MI 5.85 170.9 90 2.3 0 344 0.09 1.2 
M2 322 | 310.6 159 16.4 0 0 2 0 
M2 3.80 |2571 153 2.9 0 0 0.15 0 
M2 4.79 | 208.8 161 0.6 0 0 0.62 0 
7/2 5.07 197.2 159 0 0 4.6 0.04 0 
7/2 522 | 191.6 162 0 0 22 0.04 0 
M2 5.82 171.8 106 3.2 0 45.2 0.08 0 
M2 6.09 164.2 124 1.5 0 0 0.85 0 
М2 8.36 119.6 89 0.7 0 0 0.29 0 
М2 13.02 76.8 88 0 0 0 0.02 0 
M3 429 |23341 129 6 0 4.2 0.02 0.7 
M3 4.8 208.3 105 5.8 0 0 0.03 0 
M3 5.04 | 198.4 108 7.8 0 7.3 0.04 0 
M3 5.06 197.6 120 |116 0 23 0.02 0 
M3 5.25 190.5 149 0 0 8.6 0 0 
M3 5.5 181.8 28 9.1 0.4 56.2 0 2.2 
M3 5.55 180.2 83 5 0 0 0.02 0 
M3 7.55 132.5 68 5.5 0 27.3 0.01 0.9 
M3 9.58 104.4 28 2.4 2; 137 0 16.6 
M4 115 | 869.6 88 |624 0 182 0.52 24 
M4 2 500 89 |417 13.4 0 0 0.3 
M4 215 | 465.1 63 14.9 0 0 1.6 8.5 
M4 2.22 | 450.5 112 |372 14.2 0 0 1.5 
M4 223 | 448.4 96 И; 0 43.1 0.53 11.2 
M4 2.72 | 367.6 78 | 33.8 13.9 0 0 0 
M4 2.93 | 341.3 348 1.9 0 28 1 1.8 
M4 3.07 | 325.7 255 1.9 0 0 0.57 0 
M4 347 | 288.2 112 19.5 10.7 88 0.03 4.8 
M4 4.18 |2392 144 3.8 0 14.5 0.39 0.9 
M4 4.64 | 215.5 70 |112 62 1205 0.04 4.9 


(continued) 
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Table 7.22 (continued) 


Material | Time 1000/Time | СО HCN | ԷՏ НСІ НВг |NO: |50; 
M4 ПЭТ 132.1 92 0 03 |536 0.01 3 
М5 6.97 143.5 114 0 0 114 0 0 
М5 7.47 133.9 103 0 0 221 0 0 
M5 10.7 93.5 70 0 0 259 0.02 1.4 
M5 13.71 72.9 56 0 0 220 0.01 0.9 
M6 4.94 | 202.4 94 6.7 0 0 0.32 0 
M6 5.26 190.1 55 14.9 5.3 21.9 0 2.2 
М6 5.53 180.8 46 | 13.5 6.1 24.9 0 2.5 
M6 7.46 134 77 31 0 158 0.04 0 
М6 9.84 |101.6 52 41 07 19 0.01 14 
М6 10.9 91.7 41 2.4 0 82 0 0 
М7 3.7 270.3 398 0 0 0 0 0 
M7 3.8 263.2 345 0 0 0 0.01 0 
М7 3.63 |2611 406 0 0 0 0 0 
М7 4.04 247.5 342 0 0 23 0.04 0 
М7 5.19 192.7 196 0 0 0 0 0 
М7 6.01 166.4 148 0 02 | 387 0.01 1.9 
M7 7.56 132.3 86 0 0 0 0 0 
M7 9.41 106.3 54 2.2 0 197 0 0 2.6 
M7 9.59 104.3 55 1.7 0 321 0 0 1.1 
M7 10.79 92.7 55 4.1 0 162 0 0.02 2.9 
М8 3.7 270.3 0 |15 0 0 0 0.34 0 
М8 399 250.6 90 8.6 0 88 0 0.59 0 
М8 6.56 152.4 37 3.1 0 27.7 0 0.01 0 
М8 7.68 130.2 66 0 0 105 0 0 0 
М8 9.16 109.2 45 0 0 0 0 0.01 0 
М8 10.33 96.8 62 0 0 61 0 0.01 0 
М8 12.26 81.6 31 2.7 0 0 0 0.22 0 
M8 14.96 66.8 9 0 0 0 0 0.01 0 


Exercise 7.4.2 Cockroaches are responsible for 80% of infestations in spaces used 
by humans. They associate with humans and have the ability to contaminate food 
with their feces and secretions, having both medical and economic implications. 
Different insecticides have been formulated, mainly synthetic, and, in some cases, 
have led to the development of cockroaches’ resistance. This example deals with the 
study of survival in days (у) of this insect when exposed to two promising fungi in 
the biological control of this insect plus an already known control. The data for this 
example are shown below (Table 7.23): 
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Table 7.23 Results of the cockroach biological control experiment 


Insect | Strain | Age Time Insect | Strain | Age | Time Insect | Strain | Age | Time 
1 1 1 1 2 1 1 2 
2 1 2 1 2 2 1 20 
3 1 3 1 2 3 1 20 
4 1 4 1 3 4 1 20 
5 1 5 1 3 5 1 20 
6 1 6 1 3 6 1 20 
7 1 7 1 3 7 1 20 
8 1 8 1 4 8 1 20 
9 1 9 1 5 9 1 20 
10 1 10 1 8 10 1 20 
11 1 11 1 9 11 1 20 
12 1 12 1 10 12 1 20 
13 1 13 1 11 13 1 20 
14 1 14 1 20 14 1 20 
15 1 15 1 20 15 1 20 
16 1 1 1 20 
17 1 1 1 20 
18 1 1 1 20 
19 1 1 1 20 
20 1 1 1 20 
21 2 2 2 20 
22 2 2 2 20 
23 2 2 2 20 
24 2 2 2 20 
25 2 2 2 20 
26 2 2 2 20 
27 2 2 2 20 
28 2 2 2 20 
29 2 2 2 20 
30 2 2 2 20 
31 2 2 2 20 
32 2 2 2 20 
33 2 2 2 20 
34 2 2 2 20 
35 2 2 2 20 
36 2 2 2 20 
37 2 2 2 20 
38 2 2 2 20 
39 2 2 2 20 
40 2 2 2 20 
41 3 3 3 11 
42 3 3 3 20 
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Table 7.23 (continued) 


Insect | Strain | Age Time Insect | Strain | Age | Time | Insect | Strain | Age | Time 
43 3 3 3 20 
44 3 3 3 20 
45 3 3 3 20 
46 3 3 3 20 
47 3 3 3 20 
48 3 3 3 20 
49 3 3 3 20 
50 3 3 3 20 
51 3 3 3 20 
52 Bbl 3 14 52 Bb2 3 13 52 Test |3 20 
53 Bbl 3 16 53 Bb2 3 14 53 Test |3 20 
54 Bbl 3 17 54 Bb2 3 15 54 Test |3 20 
55 Bbl 3 17 55 Bb2 3 15 55 Test |3 20 
56 Bbl 3 20 56 Bb2 3 15 56 Test |3 20 
57 Bbl 3 20 57 Bb2 3 15 57 Test |3 20 
58 Bbl 3 20 58 Bb2 3 19 58 Test |3 20 
59 Bbl 3 20 59 Bb2 3 20 59 Test |3 20 
60 Bbl 3 20 60 Bb2 3 20 60 Test |3 20 


(a) Write down a statistical model of this experiment. 

(b) List all components of the GLMM from (а). 

(c) Write down the null and alternative hypotheses associated with this experiment. 

(d) Analyze the survival time of the insect when infected with the different types of 
fungi. 

(e) Comment on the results obtained. 


Exercise 7.4.3 Consider a study on the effect of analgesic treatments (Trt) in elderly 
patients with neuralgia. Two test treatments (A and B) and a placebo (P) are 
compared. The response variable is whether the patient reported pain or not 
(yes — 1, n — 0). The investigators recorded the age (E) and sex (S) of 60 patients 
and the duration (time — T) in which the pain disappeared after starting the 
treatment. The data are presented in the Table 7.24 below. 


(a) List all components of the GLMM for this exercise. 

(b) Write down the null and alternative hypotheses associated with this experiment. 

(c) Construct an ANOVA table indicating the sources of variation and degrees of 
freedom. 

(d) Analyze the average time during which the patient experiences pain after starting 
the treatment. Are there any significant differences? 

(e) Comment on the results obtained. 
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Table 7.24 Results with neuralgia patients (Trt = Treatment, $ = Sex, E = Age, T = Time, 
D = Pain with yes = 1 and no = 0) 


Trt |S E T D Tr |$ E T D Ir |S E T D 
P F 68 1 0 B M |74 16 0 P F 67 |30 0 
Р М |66 |26 1 B F 67 28 |0 B F 77 16 0 
А Е 71 12 0 В Е 72 |50 0 В Е 76 9 1 
А M |71 17 1 А Е 63 27 |0 А Е 69 18 1 
B F 66 12 |0 A M 6 42 0 P F 64 1 1 
А Е 64 17 |0 Р М |74 4 |0 A F 72 |25 |0 
Р М |70 1 1 B M |66 19 0 B M |59 |29 |0 
А Е 64 |30 JO А М |70 28 0 A M |69 1 0 
В Е 78 1 0 Р М 183 1 1 В Е 69 42 |0 
В M 75 30 1 Р M 77 29 1 Р Е 79 |20 1 
А М |70 12:10 А Е 69 12 0 В Е 65 14 0 
В М |70 1 0 B M 6 |23 0 A M |76 |25 1 
Р М 178 12 1 В M |77 1 1 В Е 6 24 |0 
P M |66 4 1 P F 65 29 |0 Р M |60 26 1 
A М 178 15 1 В М 175 21 1 А Е 67 11 0 
Р Е 72 |27 |0 Р Е 70 13 1 А М |75 6 1 
B F 65 7 |0 P F 68 |27 1 Р M |68 11 1 
Р M |67 17 1 B M |70 |22 0 А М (|65 15 |0 
Р Е 67 1 1 А M |67 10 0 Р Е 72 11 1 
А Е 74 1 0 B M |80 |21 1 А Е 69 з 10 


Ехегсіѕе 7.4.4 Refer (о the previous exercise апа perform an analysis of 
covariance. 


(a) List the linear predictor of this experiment. 

(b) Analyze the average time during which the patient experiences pain after starting 
the treatment using an analysis of covariance. Are there any significant 
differences? 

(c) Comment on the results obtained. Your results differ from those obtained in the 
previous year. 


Appendix 1 


Data: Onset and duration of estrus in Pelibuey ewes (age in weeks, weight in kilograms, 
Inestro = number of days from the onset of estrus, Durestro = number of days in the duration of 
estrus) 


Animal Birth type Age Weight CC Inestro Durestro 
18.5096 4 

2 1 18.4438 4 

3 1 19.3973 50.2 4 16 20 

4 1 19.3973 53.6 4 28 16 
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Animal Birth type Age Weight CC Inestro Durestro 
5 1 9.4356 47.5 4 28 4 
6 1 18.674 41.3 3 28 4 
7 1 20.0877 60.5 5 28 4 
8 1 19.5616 49.4 4 28 4 
9 1 19.5288 53.4 4 28 4 
10 1 19.7589 52.6 5 28 4 
11 1 29.9507 35.5 3 28 4 
12 1 19.3644 50.5 4 28 4 
13 1 19.0027 62.2 5 28 4 
14 1 18.3452 54Л 4 28 4 
15 1 20.0877 48.7 4 28 4 
1 2 40.7671 40.5 3 28 4 
2 2 51.189 49.3 4 12 8 
3 2 40.0767 38.1 3 20 20 
4 2 54.3123 41.9 3 24 8 
Б) 2 52.274 58 4 28 4 
6 2 53.6219 34.8 3 28 4 
7 2 40.2082 40.3 2, 24 8 
8 2 36.4932 34.6 2 28 4 
9 2 50.6301 42.1 2 28 4 
10 2 51.0247 52.6 4 28 4 
11 2 46.389 32.1 2 20 12 
12 2 50.7945 40 2 16 16 
13 2 30.411 37.9 2 24 8 
14 2 30.5096 42.2 3 20 20 
15 2 50.6959 33.2 2 24 16 
16 2 36.6247 34.2 3 20 20 
17 2 30.5425 39 2 12 32 
18 2 36.6247 33.7 2 24 16 
19 2 29.9507 32.9 2 24 16 
20 2 47.211 39.5 2 32 12 
21 2 40.2082 57.5 5 12 32 
22 2 52.2411 53.3 4 12 28 
23 2 53.4247 43.4 3 12 32 
24 2 55.5616 46 3 24 16 
25 2 30.5425 31.6 2 24 16 
26 2 29.0959 47.8 3 20 20 
27 2 40.1425 36 2. 20 20 
28 2 50.7945 42.2 3 24 16 
29 2 37.6767 44.3 3 24 16 
30 2 36.4274 43.1 2 20 20 
31 2 30.5425 38 2 20 20 
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Animal Birth type Age Weight CC Inestro Durestro 
1 3 68.9753 42.9 2 24 8 
2 3 63.1233 44 4 24 8 
3 3 68.7781 38.5 3 20 12 
4 3 64.3068 48 4 24 8 
5 3 68.6795 40.1 2 20 12 
6 3 62.6301 46.3 3 32 4 
7 3 69.8959 32.5 2 20 12 
8 3 69.6 42.8 3 20 12 
9 3 63.4849 51.3 4 24 8 
10 3 64.274 47.7 3 24 16 
11 3 63.5178 44.5 3 12 28 
12 3 78.7397 38 2 12 28 
13 3 64.537 52.5 4 12 28 
14 3 62.4329 41.2 2 12 28 
15 3 67.6603 50.8 4 20 20 
16 3 63.7151 48.2 3 20 24 
17 3 74.4986 33.3 2 32 8 
18 3 63.6493 45.1 3 24 16 
19 3 72.9205 33 3 24 20 
20 3 69.4027 40.4 3 24 16 
21 3 69.9616 43.3 3 12 28 
22 3 69.6 43.2 2 24 16 
23 3 63.4849 51 4 24 16 
24 3 63.6164 57.4 4 24 16 
25 3 67.8575 43 3 24 16 
26 3 63.6822 49.7 4 24 16 
27 3 65.7534 40.1 3 24 16 
28 3 67.989 33.4 1 20 20 
29 3 61.1836 51.6 4 20 20 
30 3 63.3534 43.3 3 20 20 
31 3 79.8904 44.7 3 24 16 
32 3 63.7151 37.9 3 20 20 

Data: Beetles 
Dose Insecticide Rep Frac Time 
Low Insecl 1 0.31 3.1 
Low Insec2 1 0.82 8.2 
Low Insec3 1 0.43 4.3 
Low Insec4 1 0.45 4.5 
Medium Insecl 1 0.36 3.6 
Medium Insec2 1 0.92 9.2 
Medium Insec3 1 0.44 4.4 
Medium Insec4 1 0.56 5.6 
High Insecl 1 0.22 2.2 
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Dose Insecticide Rep Frac Time 
High Insec2 1 0.3 3 
High Insec3 1 0.23 2.3 
High Insec4 1 0.3 3 
Low Insecl 2 0.45 4.5 
Low Insec2 2 1.1 11 
Low Insec3 2 0.45 4.5 
Low Insec4 2 0.71 7.1 
Medium Insecl 2 0.29 2.9 
Medium Insec2 2 0.61 6.1 
Medium Insec3 2 0.35 3.5 
Medium Insec4 2 1.02 10.2 
High Insecl 2 0.21 2.1 
High Insec2 2 0.37 3.7 
High Insec3 2 0.25 2:5 
High Insec4 2 0.36 3.6 
Low Insecl 3 0.46 4.6 
Low Insec2 3 0.88 8.8 
Low Insec3 3 0.63 6.3 
Low Insec4 3 0.66 6.6 
Medium Insecl 3 0.4 4 
Medium Insec2 3 0.49 4.9 
Medium Insec3 3 0.31 3.1 
Medium Insec4 3 0.71 7.1 
High Insecl 3 0.18 1.8 
High Insec2 3 0.38 3.8 
High Insec3 3 0.24 2.4 
High Insec4 3 0.31 3.1 
Low Insecl 4 0.43 4.3 
Low Insec2 4 0.72 7.2 
Low Insec3 4 0.76 7.6 
Low Insec4 4 0.62 6.2 
Medium Insecl 4 0.23 2.3 
Medium Insec2 4 1.24 12.4 
Medium Insec3 4 0.4 4 
Medium Insec4 4 0.38 3.8 
High Іпвесі 4 0.23 2:3 
High Insec2 4 0.29 2.9 
High Insec3 4 0.22 22 
High Insec4 4 0.33 3.3 
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Block Demineralization Pasteurization pH y 
1 2 2 1 198.5 
1 2 2 2 299 
1 2 2 3 223.1 
1 1 1 1 166.6 
1 1 1 2 196.5 
1 1 1 3 178.9 
1 1 2 1 160.7 
1 1 2 2 151.1 
1 1 2 3 146.5 
1 2 1 1 146.3 
1 2 1 2 169.3 
1 2 1 3 198.1 
2 2 2 1 236.3 
2 2 2 2 330.7 
2 2 2 3 281.2 
2 1 1 1 151.8 
2 1 1 2 156 
2 1 1 3 159.7 
2 1 2 1 171.8 
2 1 2 2 159.3 
2 1 2 3 155.9 
2 2 1 1 175.2 
2 2 1 2 182.2 
2 2 1 3 136.2 
Data: Aedes aegypti (Trt = treatment, Кер = repetition, Y = survival time) 
Trt Rep |Y |Ти Rep У Ти Rep |Y | Trt Rep |Y Ти Rep Y 
Control |1 8 Mam |1 2 |MaS 1 3 1 2 1 2. 
Control |1 11 Mam |1 2 Маз | 3 1 2 1 2 
Control |1 11 Mam |1 2 Маз | 3 1 2 1 2 
Control |1 11 Mam |1 2 Маз | 3 1 2 1 2 
Control |1 11 Mam |1 2 Маз | 4 1 3 1 3 
Control |1 11 Mam |1 2 Маз | 5 1 3 1 3 
Control |1 13 Mam |1 2 Маз | 6 1 3 1 3 
Control |1 13 Mam |1 2 Маз | 6 |МаС |1 3 Mal |1 3 
Control |1 14 Mam |1 2 |MaS 1 9 |МаС |1 3 Mal |1 6 
Control |1 20 Mam |1 2 Маз | 12 |MaC |1 4 Mal 11 12 
Control |2 8 Mam |2 2 |MaS |2 3 |MaC |2 2 Mal |2 2 
Control |2 11 Mam |2 2 |MaS |2 3 MaC |2 2 Mal |2 2 
Control |2 11 Mam |2 2 |MaS |2 3 Мас |2 2 Mal |2 2 
Control |2 11 Mam |2 2 MaS |2 3 Мас |2 2 Mal |2 3 
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Trt Rep |Y | Trt Rep Y Ղու Բ. Y Y Rep |Y 

Control | 2 11 Mam |2 2 |MaS |2 3 2 2 2 3 
Control | 2 11 Mam |2 2 |MaS |2 3 2 2 2 3 
Control | 2 15 | Mam | 2 2 |MaS |2 3 2 3 2 3 
Control |2 15 Mam |2 2 |MaS |2 4 2 3 2 4 
Control |2 15 Mam |2 2 |MaS |2 5 2 3 2 4 
Control |2 16 Mam |2 2 |Ма5 |2 6 2 4 2 4 
Control |3 11 | Mam |3 2 |MaS |3 3 3 2 3 2 
Control |3 11 | Mam |3 2 |MaS |3 3 3 2 3 2 
Control |3 11 | Mam |3 2 |MaS |3 3 3 2 3 2 
Control |3 11 | Mam |3 2 |MaS |3 2 3 2 3 2 
Control |3 23 | Mam |3 2 |MaS |3 2 3 3 3 3 
Control |3 25 | Mam |3 2 |MaS |3 5 3 3 3 3 
Control |3 26 | Mam |3 2 |MaS |3 5 3 3 3 3 
Control |3 27 |Mam |3 2 |MaS |3 6 3 3 3 3 
Control |3 30 | Mam |3 2 |MaS |3 10 3 4 3 4 
Control |3 30 | Mam |3 2 [Mas |3 12 3 4 3 4 
Control |4 8 Mam |4 2 | Ма5 |4 3 4 2 4 2 
Control |4 8 Mam |4 2 | Ма |4 3 4 2 4 2 
Control |4 11 Mam |4 2 |Ма5 |4 3 4 2 4 2 
Control |4 13 Mam |4 2 |Ма5 |4 4 4 2 4 3 
Control |4 14 Mam |4 2 |Ма5 |4 4 4 2 4 3 
Control |4 19 Mam |4 2 |Маѕ |4 5 MaC |4 2 Mal |4 3 
Control |4 20 Mam |4 2 | Ма |4 5 MaC |4 3 Mal |4 4 
Control |4 20 Mam |4 2 |Ма5 |4 6 |MaC |4 3 Mal |4 5 
Control |4 20 Mam |4 2 | Ма |4 9 |MaC |4 3 Mal |4 6 
Control |4 22 Mam |4 2 |Ма5 |4 12 |MaC |4 3 Mal |4 11 


Data: Aedes aegypti (Bio = bioassay, Trt = treatment, Rep = repetition, Y = survival time) 


Bio Trt Bio Trt Rep Ү Bio Trt Rep Y 

ВІ С 1 11 ВІ Ма5 3 3 B2 C 1 7 
ВІ С 1 11 ВІ Ма5 3 3 B2 C 1 8 
ВІ С 1 11 В1 MaS 3 2 |B2 C 1 8 
ВІ C 1 11 В1 MaS 3 2 |B2 C 1 10 
ВІ С 1 11 В1 MaS 3 5 B2 C 1 13 
ВІ С 1 13 |В1 Ма5 3 5 B2 С 1 14 
ВІ С 1 13 81 Ма5 3 6 |В2 С 1 16 
ВІ С 1 14 81 Ма5 3 10 |В2 C 1 20 
В1 C 1 20 BI MaS 3 12 |В2 С 1 22 
В1 C 2 Տ BI MaS 4 3 B2 C 1 22. 
В1 C 2 11 В1 MaS 4 3 |B2 С 1 23 
ВІ C 2 11 ВІ Ма5 4 3 |В2 C 1 23 
В1 C 2 11 В1 Ма$ 4 4 |B2 C 1 23 
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Bio Trt Rep Y Bio Trt Rep Y Bio Trt Rep Y 

B2 MaS 2 3 8 |B2 Ма1 4 8 

B2 MaS 2 3 9 |B2 Mal 4 9 

B2 MaS 2 3 9 |В2 Mal 4 9 

B2 MaS 2 3 9 |B2 Mal 4 11 

B2 MaS 2 3 9 |в2 Mal 4 11 

B2 MaS 2 3 9 |B2 Mal 4 11 

B2 MaS 2 11 B2 MaC 3 10 |В2 Mal 4 11 

B2 MaS 2 11 B2 MaC 3 10 |В2 Mal 4 11 

B2 MaS 2 11 B2 MaC 3 10 |В2 Ма1 4 12 

B2 MaS 2 11 B2 MaC 3 10 |В2 Mal 4 12 

B2 MaS 2 12 |B2 MaC 3 10 |B2 Mal 4 13 

B2 Ма5 2 12 |B2 MaC 3 11 B2 Mal 4 13 

B2 Ма5 2 12 |B2 MaC 3 13 |B2 Mal 4 13 
B2 Mal 4 13 
B2 Mal 4 14 
B2 Mal 4 18 

Data: Pelibuey Sheep 

Animal Birthtype Age Weight Inestro Durestro 

1 1 18.509589 52.5 28 4 

2 1 18.4438356 47.4 28 4 

3 1 19.3972603 50.2 16 20 

4 1 19.3972603 53.6 28 16 

5 1 9.43561644 47.5 28 4 

6 1 18.6739726 41.3 28 4 

7 1 20.0876712 60.5 28 4 

8 1 19.5616438 49.4 28 4 

9 1 19.5287671 53.4 28 4 

10 1 19.7589041 52.6 28 4 

11 1 29.9506849 35.5 28 4 

12 1 19.3643836 50.5 28 4 

13 1 19.0027397 62.2 28 4 

14 1 18.3452055 54.7 28 4 

15 1 20.0876712 48.7 28 4 

1 2 40.7671233 40.5 28 4 

2 2 51.1890411 49.3 12 8 

3 2 40.0767123 38.1 20 20 

4 2 54.3123288 41.9 24 8 

5 2 52.2739726 58 28 4 

6 2 53.6219178 34.8 28 4 

7 2 40.2082192 40.3 24 8 

8 2 36.4931507 34.6 28 4 

9 2 50.630137 42.1 28 4 
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Animal Birthtype Age Weight Inestro Durestro 
10 2 51.0246575 52.6 28 4 
11 2, 46.3890411 32.1 20 12 
12 2 50.7945206 40 16 16 
13 2 30.4109589 37.9 24 8 
14 2 30.509589 42.2 20 20 
15 2 50.6958904 33.2 24 16 
16 2 36.6246575 34.2 20 20 
17 2 30.5424658 39 12 32 
18 2, 36.6246575 33.7 24 16 
19 2 29.9506849 32.9 24 16 
20 2 47.2109589 39.5 32 12 
21 2 40.2082192 57.5 12 32 
22 2 52.2410959 53.3 12 28 
23 2 53.4246575 43.4 12 32 
24 2 55.5616438 46 24 16 
25 2 30.5424658 31.6 24 16 
26 2 29.0958904 47.8 20 20 
27 2 40.1424658 36 20 20 
28 2 50.7945206 42.2 24 16 
29 2 37.6767123 443 24 16 
30 2 36.4273973 43.1 20 20 
31 2 30.5424658 38 20 20 
1 3 68.9753425 42.9 24 8 
2 3 63.1232877 44 24 8 
3 3 68.7780822 38.5 20 12 
4 3 64.3068493 48 24 8 
5 3 68.6794521 40.1 20 12 
6 3 62.630137 46.3 32 4 
7 3 69.8958904 32.5 20 12 
8 3 69.6 42.8 20 12 
9 3 63.4849315 51.3 24 8 
10 3 64.2739726 47.7 24 16 
11 3 63.5178082 44.5 12 28 
12 3 78.739726 38 12 28 
13 3 64.5369863 52.5 12 28 
14 3 62.4328767 412 12 28 
15 3 67.660274 50.8 20 20 
16 3 63.7150685 48.2 20 24 
17 3 74.4986301 33.3 32 8 
18 3 63.6493151 45.1 24 16 
19 3 72.920548 33 24 20 
20 3 69.4027397 40.4 24 16 
21 3 69.9616438 43.3 12 28 


(continued) 
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Animal Birthtype Age Weight Inestro Durestro 
22 3 69.6 43.2 24 16 
23 3 63.4849315 51 24 16 
24 3 63.6164384 57.4 24 16 
25 3 67.8575343 43 24 16 
26 3 63.6821918 49.7 24 16 
27 3 65.7534247 40.1 24 16 
28 3 67.9890411 33.4 20 20 
29 3 61.1835616 51.6 20 20 
30 3 63.3534247 43.3 20 20 
31 3 79.890411 44.7 24 16 
32 3 63.7150685 37.9 20 20 


Data: Gum Breakdown Times 

Time 
198.5 
299 
223.1 
166.6 
196.5 
178.9 
160.7 
151.1 
146.5 
146.3 
169.3 
198.1 
236.3 
330.7 
281.2 
151.8 
156 
159.7 
171.8 
159.3 
155.9 
175.2 
182.2 
136.2 
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Сһар{ег 8 (Я) 
Generalized Linear Mixed Models Kat 
for Categorical and Ordinal Responses 


8.1 Introduction 


According to Agresti (2013), a multinomial distribution is a generalization of a 
binomial distribution in cases with more than two possible ordered (ordinal) or 
unordered (nominal) outcomes. Given a response with more than two possible 
outcomes and independent trials with probabilities of similar category for each 
trial, the distribution of counts across categories follows a multinomial distribution. 
Quinn and Keough (2002) believe that several methods exist for multinomial data 
analysis. The most common form of categorical data analysis in biological sciences, 
which results in frequency counts, is creating cross-tabulations or contingency tables 
and chi-squared tests to examine associations between two or more categorical 
variables. However, such an approach is ill suited for a study aimed at estimating 
the response when there is a change in the explanatory variable(s), as contingency 
tables are used to analyze the association between variables without considering a 
predictor or response variable. In this analysis, the results are valid as long as less 
than 20% of the cells have an expected count less than five and none are less than one 
(Logan 2010). Fisher’s exact test extends the chi-squared test in studies involving 
small sample sizes. 

There are several methods for modeling multinomial data; traditional methods of 
multinomial data analysis include frequency analysis (counts), which uses the 
chi-squared test and the log-linear model for contingency tables. This chapter 
focuses on describing multinomial logit and probit models in detail. 
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8.2 Concepts and Definitions 


For the multinomial distribution each observation drawn from a total of 

N observations belongs to exactly one of the mutually and exclusive c = 1, ©, 

C categories and each category has a probability л, (c = 1, +++, C) of belonging to the 

category c. A multinomial distribution refers to the probability that exactly one 

randomly sampled observation from the population belongs to category yi, that is, it 

belongs to category 1, y2 observations belong to category 2, and so forth up to 
С € 


category C,where >` y, = Мапа У) zç = 1. The density function of this distribution 
с=1 с=1 
is equal to 


= М! 91 2 Ус 
Жоу» xc) yyl.. »el лу... 

Multinomial models are applied in data analysis where Ше categorical response 
variable has more than two possible outcomes while the independent variables can 
be continuous, categorical, or both (Hosmer and Lemeshow 2000). The categorical 
response variable can be either ordinal (ordered) or nominal (unordered). Ordinal 
response variables are single values that represent a rank order on some dimension, 
but there are not enough values to be treated as a continuous variable. Nominal 
(unordered) response variables are those whose values provide a rank but do not 
provide an indication of order. Models for multinomial data are constructed in a 
similar way as for binomial data. The link functions used in these types of models are 
similar to the logit and probit functions used for binomial data. Cumulative logit and 
cumulative probit models define the link function such that when properly fitted to 
the data, they allow for parsimonious modeling of ordinal or multinomial data. 
Generalized logit and probit models do not require ordered categories and are 
therefore suitable for multinomial nominal data. 

In terms of generalized linear models (GLMs) and generalized linear mixed 
models (GLMMs), a multinomial distribution with C categories requires C — 1 
link functions to fully specify a model that relates the response probabilities 
(T1, T2, .. ., тє) to the linear predictor. The commonly used models are the cumula- 
tive logit model, also known as the proportional odds model proposed by McCullagh 
(1980), and the cumulative probit model, also known as the threshold model. 
Throughout this chapter, we will use either of these two link functions 
interchangeably. 

The link functions for a cumulative logit model with C categories are 
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m= ռր =) =m + Xp + Zb 


E лі + 22 с> 
m = log h zi 2) =m + ХВ + Zb 


л+л + ` P gc-1 
1— (mi +m + лс 


ЖЕ JE 


where X and Z are the design matrices, whereas В and b are the vectors of fixed and 
random effects parameters, respectively. The inverse links of each of the functions 
are as follows: 


1 
лу = тет =h(m) 


1 
л+л) = Ipem -հնք) 


M tg + -ԷՊԸ = тет = h(nc 1). 
Once hni), հն), ... հն. — 1) have been estimated, we can then estimate the 
probabilities ոլ, 7, ..., Zç. 


8.3 Cumulative Logit Models (Proportional Odds Models) 


Multinomial logit models are used to model the relationships between a polytomous 
response variable and a set of predictor variables. These polytomous response 
models can be classified — as mentioned above — into two different types, depending 
on whether the response variable has an ordered or an unordered structure. 

In a proportional odds model, the covariates (linear predictor 4) have the same 
effect on the probabilities that the response variable has in any category when 
considering different values of the covariates, thus shifting the response distribution 
to the right (or left) without changing the shape of the distribution. In a proportional 
odds model, the cumulative logits model the effect of the covariates on the response 
probabilities below or equal to the category cutoff. 

A multinomial logit model assumes independence of categories, which implies 
that the probabilities of choosing a category c relative to a category c are indepen- 
dent of the category characteristics of c and c forc c. The assumption requires that 
if a new category is available, then the prior probabilities are precisely adjusted to 
preserve the original probabilities between all pairs of outcomes. The proportional 
odds model employs a strict assumption that the odds ratio does not depend on the 
category, and, therefore, we need to test the proportional odds assumption, which is 
also called the "parallel regression assumption." 
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8.3.1 Complete Randomize Design (CRD) with a Multinomial 
Response: Ordinal 


Data are obtained from an experiment related to red core disease in strawberries, 
which is caused by the fungus Phytophthora fragariae. In this example, 12 straw- 
berry populations were evaluated in a completely randomized experiment with 
4 replications (Table 8.1). Plots generally consisted of 10 plants; in some cases, 
only 9 plants were observed. At the end of the experiment, each plant was assigned 
to one of three ordered categories representing fungal damage (1 = no damage, 
2 = moderate damage, and 3 = severe damage). 

A total of 12 populations were obtained by crossing 3 genotypes of male parents 
with 4 genotypes of female parents. The variation between and within plots is 
considered minimal, whereas the genetic and nongenetic effects are more significant, 
as plants from the same cross are not genetically identical. 

The model that fits these data for the cumulative probabilities is a GLMM, which 
exhibit a classification effect on the treatment variable (population resulting from 
crossing genotypes). Thus, the GLMM for multinomial ordered outcomes with 
С categories requires C — 1 link function equations to fully specify the model that 
relates the response probabilities (ոլ, %2, . . ., zc) to the linear predictor у; (Stroup 
2013). The C — 1 multinomial logit equations are tested against each of the 
remaining categories 1, 2, .., C — 1. 


Table 8.1 Evaluation of red core disease in strawberry plants 


Repetition 

1 2 3 4 

Disease category 
Parent plant male/female 1 2 3 1 2 3 1 2 3 1 2 3 
1 1 0 3 |6 |2 |2 |6 2 |3 |5 [2 |5 3 
1 2 2 13 5 |0 3 |7 |4 |6 |0 2 3 5 
1 3 34 3 |7 (2 1 1 1 7 12 3 5 
1 4 0 |5 5 |5 14 1 2 |8 0 1 4 5 
2 1 1 |4 |4 2 |2 |6 1 2 |7 1 5 4 
2 2 1 4 5 |3 |4 |2 1 6 |3 |4 |2 4 
2 3 4 3 3 [5 1 |4 |3 |3 14 |4 [2 4 
2 4 1 4 |5 1 2 |6 |8 5 JO |2 5 3 
3 1 0 0 9 3 5 |2 2 |5 |3 |0 JO 10 
3 2 5 32 13 2 15 1316 | 2 1 7 
3 3 0 3 |6 |2 |5 |3 1 3 6 |0 |3 7 
3 4 з 0 17 |5 2 |3 17 |3 |0 3 4 3 


8.3 Cumulative Logit Models (Proportional Odds Models) 325 


The components of the GLMM with an ordinal multinomial response are as 
follows: 


Distributions: узу, уу, y3jlrj- Multinomial(N;, л; лору, язу), Where у, уз, and уз 
are the observed frequencies of responses (damage level) in each category 
C (1 = no damage, 2 = moderate damage, and 3 = severe damage) and r; is 
the random effect due to repetition, assuming r; ~ N (0, б). 

Linear predictor: Neij = Ис + Ti + rj, where rjj; is the cth link (с = 1,2,3) that relates 
the mean and the linear predictor for the treatment i (i = 1, 2,..., 12) and the jth 
block (j = 1, 2,3, 4); nc is the intercept for the cth link; 7, is the fixed effect due to 
the ith treatment (cross); and r; is the random effect due to the jth repetition 


(r; “М (0, o2)). The link functions for each category are as follows: 


Meo о. 


The following GLIMMIX program fits a cumulative logit model with an ordinal 
multinomial response in a CRD. 


proc glimmix data=FRESA; 

class rep trt cat; 

model cat (order=data)= trt/dist=Multinomial link=clogit solution 
oddsratio; 

random intercept/subject=rep solution ; 

estimate 'c=1, t=1'intercept10Otrt1000000000000000000, 


Іс--2, Е-1! intercept 01 ЕГЕ 1 00000000000000000, 
Іс--1, Е=2' intercept 1 0 ЕГЕ 01000000000000000, 

Іс--2, Е-2! intercept 01 ЕСЕ 011000000000000000, 
Іс--1, Е-3! intercept 10 ЕГЕ 0000100000000000000, 
'c=2, Е-3! intercept 01 ЕКЕ 000100000000000000, 
/Շ-1, t=4' intercept 10 ЕГЕ 000001000000000000, 
Іс--2, Е-4! intercept 01 ЕСЕ 000001000000000000, 
'c=1, Е-5! intercept 10 ЕГЕ 000000010000000000, 
Іс--2, t=5' intercept 01 ЕГЕ 000000010000000000, 
Іс--1, Е-6! intercept 10 ЕГЕ 000000001000000000, 
Іс--2, Е-6! intercept 01 ЕГЕ 000000001000000000, 
Іс-1, Е=7' intercept 10 ЕгЕ 000000000010000000, 
'c=2, Е-7! intercept 01 ЕГЕ 0000000000100000000, 
Іс-1, Е=8' intercept 10 ЕгЕ 000000000001000000, 
Іс--2, t=8' intercept 01 ЕКЕ 0000000000001000000, 
'c=1, Ե-9' intercept 1 0 ЕГЕ 00000000000010000, 

'c=2, t=9' intercept 01 ЕГЕ 000000000000010000, 


'c=1, t=10' intercepti10trt00000000000000100, 
Іс--2, t=10' intercept 01 ЕГЕ 000000000000000100, 
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Table 8.2 Data ordered Rep Cross Cat Freq 
тер1 MIHI Without 0 
тер1 М1Н2 Without 2 
тер1 MIH3 Without 3 
тер1 MIH4 Without 0 
тер1 MIHI Moderate 3 
тер1 М1Н2 Moderate 3 
тер1 MIH3 Moderate 4 
тер1 MIH4 Moderate 5 
тер1 MIHI Severe 6 
тер1 М1Н2 Severe 5 
тер1 MIH3 Severe 3 


'c=1, Е--11! intercept 10 ЕГЕ 000000000000000010, 

'c=2, t=11' intercept 01trt 0000000000000000010, 

'c=1, t=12' intercept 10 ЕГЕ 000000000000000001, 

'c=2, t=12' intercept 01trt0000000000000000001/ilink; 
freq freq; 

тий; 


Although most of Ше GLIMMIX commands have already been described in 
previous examples, it is important to emphasize that the data should be structured 
in a logical way as follows: one line for repetition, treatment, lesion category, and the 
frequency or number of observations (Y), which, in this case, is referenced by the 
variables rep, trt (trt = cross), cat (category), and freq, respectively. Part of the data 
arrangement can be seen in Table 8.2, whereas the rest of the dataset can be found in 
the Appendix (Data: CRD with multinomial response: ordinal). 

In the program commands of this example, “order = data” indicates that the order 
in which the categories are arranged in the dataset is under an order (ordinal) 
category. Consider that the observations in each line always have order categories 
such as no injury (Without), moderate injury (Moderate), and severe injury (Severe). 
If there is no congruent order in the arrangement of the dataset to be analyzed, then 
GLIMMIX will reorder the categories in an alphabetical or numerical order 
depending on the initial coding of the data. The “estimate” command specifies the 
estimable functions that form the boundaries between the categories for each of the 
populations (trt). Finally, the “freq command” instructs GLIMMIX to use “freq” as 
the number of observations (frequency) under the corresponding categorization. In 
this way, the first estimate “c = 1, = 1 defines the predictor 7, + ոլ, that is, the 
boundary between the “Without” and “Moderate” categories for treatment 1 with its 


corresponding logit log ( zr =) ‚ whereas the second estimate c = 2, = 1 defines 


the boundary between the categories of “Moderate” and “Severe” damage with the 


Zi 21 


logit log (m). which estimates the probability of observing a plant from 
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Table 8.3 Results of the multinomial analysis of variance for injury level in strawberry plants 


(a) Covariance parameter estimates 


Cov Parm Subject Estimate Standard error 
Intercept Rep 0.1453 0.1437 

(b) Type III tests of fixed effects 

Effect Num degree of freedom (DF) Den DF F-value Pr> F 
Trt 11 457 2.60 0.0032 
Table 8.4 Fixed effects solution for injury categories 

Solutions for fixed effects 

Effect Cat Trt Estimate | Standard error |DF [value Pr > lt 
Intercept | Without (7,) —0.4571 | 0.3526 3 —1.30 | 0.2855 
Intercept | Moderate (>) 1.0631 0.3558 3 2.99 0.0582 
Trt т MIHI | —1.1456 | 0.4264 457 |-2.60 |0.0075 
Trt (© MIH2 | 0.8355 | 0.4179 457 |—2.00 | 0.0462 
Trt (73 MIH3 |-0.4621 |04171 457 |-ІЛІ | 0.2685 
Trt (74 МІН4 |-0.4716 |0.4145 457 |-ІЛ4 |0.2558 
Тп (5 МОНІ | —1.2644 | 0.4295 457 | —2.94 | 0.0034 
Trt (% M2H2 |-0.6060 | 04181 457 |—1.45 | 0.1479 
Trt (7; M2H3 |—0.2332 | 0.4140 457 |—0.56 0.5735 
Trt (Ts M2H4 |—0.3912 | 0.4168 457 | —0.94 | 0.3484 
Trt (To M3H1 |-1.5563 |0.4393 457 | —3.54 | 0.0004 
Ти (Фо) M3H2 | 0.4508 | 0.4144 457 |—1.00 | 0.2772 
Trt (ти) M3H3 |-1.4426 |0.4350 457 |-332 | 0.0010 
Trt (712) M3H4 0 


population! (М1Н1 = trt) “Without” damage and “Moderate” damage when 
exposed to the fungus (Phytophthora fragariae). Part of the output is presented in 
Table 8.3. 

The estimated variance component (part (a)) due to plants is 22 = 0.1453, 
whereas the hypothesis tests for type Ш effects (part (b)) (“Туре Ш tests of fixed 
effects”) indicate that the crosses have different significant tolerance levels to fungal 
attacks (Pr > F = P = 0.0032). The results of the fixed effects solution, obtained by 
specifying the “solution” option in the model, are shown in Table 8.4. 

From the fixed effects solution, we can estimate the linear predictors for the two 
categories of each treatment, which are in terms of the model scale. For example, 
for treatment 1, the first category of injury ў =ñ +2 = —0.4571 + 
(— 1.1456) = — 1.6027, where 7, defines the boundary between the categories 
“Without” damage and “Moderate” damage and 7, defines the boundary between 
the categories “Moderate” damage and “Severe” damage, and the linear predictor is 
լւ =M +71 = 1.0631 + (— 1.1456) = — 0.0825. Note that for the proportional 
odds, the т; values are not category-specific; treatment effects move the boundaries 
as a group. 
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Table 8.5 Estimated odds 
ratio 


Odds ratio estimates 
Trt _Trt Estimate | DF 95% Confidence limits 


The odds ratio (Table 8.5) is the result of taking e* for crosses 1-12. Since odds 
ratios are not specific to a particular category, this value is the same for all three 
categories and hence the name odds ratio. 

In Table 8.6, we show the maximum likelihood estimates of the linear predictors 
Па = flc + 7; in the "Estimate" column, in terms of the model scale, as well as the 
means on the data scale for each of the categories of the treatments tested (“Mean”). 

Thus, for c = 1, = 1 (response category “Without” damage and treatment 1), the 
estimator is 7 = — 1.6027 and for c = 2, = 1 (“Модегае” damage and treatment 
1), the linear predictor is "| = — 0.0825. Taking the inverse of the link function 
yields the probability of zi; = ոօ» = 0.1676. This is the estimated probability 
for which the cross (treatment) МІНІ has a response score of “Without damage." 
This inverse value is presented under the “Mean” column (Table 8.6). 

Now, for c = 2, t = 1, the inverse of the link yields the following probability: 
Tii + 721 = 10065 = 0.4794 (cumulative probability). From this value, we deduce 
the probability of observing a "Moderate" damage and a "Severe" damage in 
the plat of the cross MIHI. For “Moderate” damage, 
the probability is 221 = 0.4794 — т = 0.4794 — 0.1676 = 0.3118, and, for 
“Severe” damage, it is 73, = 1 — 7%, + 121 = 1 — 0.4794 = 0.5206. Similarly, the 
rest of the probabilities in the different crosses are estimated. 


8.3.2 Randomized Complete Block Design (RCBD) 
with a Multinomial Response: Ordinal 


In recent years, poultry production has become conscious of animal welfare, which 
is associated with bird mortality, behavior, and health, among others (Stanley 1981; 
Martrenchar et al. 2002). One of the diseases related to animal welfare is footpad 
dermatitis, and, among many repercussions, it affects a bird's ability to walk (Bilgili 
et al. 2009). Pododermatitis is known as contact dermatitis or footpad dermatitis and 
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Table 8.6 Estimates on the model scale (Estimate) and on the data scale (Mean) for the damage 
categories in strawberry plants 


Estimates 

Standard Standard error 
Label Estimate | error DF |ғуаше |Pr > Ifl Mean | mean 
c=1,t=1 |-1.6027 |0.3706 457 |-4.33 | <0.0001 |0.1676 | 0.05170 
c=2,t=1 | -0.08254 | 0.3625 457 |-0.23 0.8200 | 0.4794 | 0.09047 
c=1,t=2 |-1.2926 0.3597 457 |-3.59 0.0004 | 0.2154. | 0.06080 
РЕЙ) 0.2276 |0.3542 457 0.64 0.5208 |0.5567 | 0.08741 
c=1,t=3 | 0.9191 | 0.3572 457 |—2.57 0.0104 | 0.2851 | 0.07281 
C223 0.6010 |0.3555 457 1.69 0.0916 |0.6459 | 0.08131 
c=1,t=4 |-0.9286 0.3542 457 | —2.62 0.0090 | 0.2832 | 0.07190 
c=2,t=4 0.5915 | 0.3524 457 1.68 0.0939 | 0.6437 | 0.08081 
c=1,t=5 | -1.7214 | 0.3744 457 | —4.60 | <0.0001 | 0.1517 | 0.04818 
c=2,t=5 |—0.2013 | 0.3656 457 | —0.55 0.5822 | 0.4499 | 0.09047 
c=1,t=6 |—1.0631 | 0.3590 457 | —2.96 0.0032 | 0.2567 | 0.06850 
c=2,t=6 0.4571 |0.3557 457 1.28 0.1995 | 0.6123 | 0.08444 
c=1,t=7 | —0.6903 |0.3526 457 | —1.96 0.0509 | 0.3340 | 0.07842 
СЕТ, 0.8299 |0.3533 457 2.35 0.0193 |0.6963 | 0.07471 
c=1,t=8 | —0.8483 | 0.3566 457 | —2.38 0.0178 | 0.2998 | 0.07485 
c=2,t=8 0.6719 0.3556 457 1.89 0.0595 | 0.6619 | 0.07958 
c=1,t=9 | -20133 | 0.3864 457 |-521 | «0.0001 | 0.1178 | 0.04016 
c=2,t=9 |—0.4932 | 0.3759 457 |-1.31 0.1902 | 0.3791 | 0.08849 
c= 1, —0.9079 |0.3540 457 |-2.56 0.0106 | 0.2874 | 0.07250 
і-10 
с=:2, 0.6123 0.3524 457 1.74 0.0830 | 0.6485 | 0.08033 
t= 10 
cai —1.8997 | 0.3813 457 | —4.98 | <0.0001 | 0.1301 | 0.04317 
t=11 
c= 2; —0.3795 | 0.3714 457 | —1.02 0.3074 |0.4062 | 0.08958 
t= 11 
e=: 1; —0.4571 |0.3526 457 |-1.30 0.1955 |0.3877 | 0.08369 
t= 12 
с=2; 1.0631 | 0.3558 457 2.99 0.0030 | 0.7433 | 0.06789 
t= 12 


is characterized by inflammation and necrotic lesions from the plantar surface to 
deep within the footpads of chicken. Deep ulcers may result in abscesses and in the 
thickening of the underlying tissues and structures (Greene et al. 1985). 

Chicken feet have great economic importance because they are in high demand in 
the foreign market, mainly in Southeast Asia and China; however, due to diseases or 
alterations such as pododermatitis, there are significant economic losses since 
diseased feet are not suitable for human consumption and this, subsequently, reflects 
in market prices (Taira et al. 2014). Due to the economic importance of this product, 
Garcia et al. (2010) have focused on studying the factors that cause this disease and 
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Table 8.7 Treatment design 


Treatment Features 

Trtl Traditional program +1 kg m ? of rice husks 

Trt2 Traditional program +2 kg m ? of rice husks 

Trt3 Traditional program + podal health program +1 kg m of rice husks 
Trt4 Traditional program + рода! health program +2 kg m ? of rice husks 


on finding strategies to reduce leg and carcass lesions in poultry. Important factors in 
broiler fattening are the type of litter, litter height, nutrition and feeding programs, 
and bird health, among others. 

The objective of this study was to evaluate the effect of litter density and organic 
minerals (Availa Zn and Availa Mn), with an extract of Yucca schidigera (Micro- 
Aid) as a supplement to a traditional fattening program, on the development of 
footpad dermatitis in broilers. The genetic material used in this experiment was 
mainly male Ross line chickens. The traditional broiler fattening program by the 
poultry farm consists of three phases: a starter diet (1-18 days), a grower diet 
(19-35 days), and a finisher diet (36-50 days), applied for a period of 50 days, 
where rice husk is used as bedding material at a density of 1 kg m 2. In this research, 
a foot health program was implemented in addition to the traditional fattening 
program, which included the addition of 125 ppm of Micro-Aid (Yucca schidigera 
extract), 40 ppm of Availa Zn, and 40 ppm of Availa Mn to the fattening diet. 

Based on the above information, four treatments were evaluated at two poultry 
farms, as described below: 


° Treatment 1 involved the application of the company’s traditional fattening 
program (Trt1). 

* Treatment 2 was the company's traditional fattening program plus an increase in 
litter density from 1 to 2 kg m ? (Trt2). 

* Treatment 3 was the traditional fattening program plus the implementation of the 
foot health program during the fattening period until completion (Trt3). 

* Treatment 4 consisted of the traditional fattening program plus the implementa- 
tion of the foot health program and an increase in litter density from 1 to 2 kg m 2 
(Trt4). The following table lists the treatments studied (Table 8.7): 


The response variable evaluated was the degree of foot lesion (pododermatitis) at 
the end of the fattening period (50 days). The response variable was evaluated on 
1250 chickens per treatment. The degree of a footpad lesion was determined 
according to a visual guide for lesions in chickens based on the method of De 
Jong and Guémené (2012). This method entails defining three grades: grade 0 is 
attributed to legs with no lesions, grade one is if lesions exist in some areas of the 
footpad (<50%), and grade two is if the leg has extensive lesions in areas of the 
footpad (50-100%). Table 8.8 shows the dataset indicating the block, treatment, 
level of lesion, and the number of birds observed with a given lesion (frequency). 


8.3 Cumulative Logit Models (Proportional Odds Models) 331 


Table 8.8 Pododermatitis in broilers 


Block Trt Category Frequency Block Trt Category Frequency 
1 1 Without 26 1 3 Without 54 
1 1 Slight 58 1 3 Slight 43 
1 1 Severe 17 1 3 Severe 3 
2 1 Without 37 2 3 Without 25 
2 1 Slight 56 2 3 Slight 69 
2 1 Severe 6 2 3 Severe 7 
1 2 Without 40 1 4 Without 65 
1 2 Slight 57 1 4 Slight 34 
1 2 Severe 3 1 4 Severe 1 
2 2 Without 77 2 4 Without 63 
2 2 Slight 23 2 4 Slight 36 
2 2 Severe 0 2 4 Severe 0 


Note: Without stands for no lesion, slight stands for moderate lesion, and severe stands for severe 
lesion 


The GLMM for multinomial ordered results with C categories requires C — 1 link 
function equations instead of one to fully specify a model that relates the response 
probabilities (ոլ, %2, ...,zc) to the linear predictor у; (Stroup 2013). The C — 1 
multinomial logit equations are tested against each of the categories 1, 2, ..., C — 1. 

The link functions for the cumulative logit model to describe the response 
variable with C categories are as follows: 


Tij 
Naij = (129) = т + z; + b; 


лу + Mij 
„= lo = F z; + b; 
ШӨ ն տ cm No J 


Mi + Tij HU E ACi 
Mc- D = log Ç | : | а Ы =Nc-1t zi + b; 
1 (ті Ւ JD a-i) 


The components of the GLMM with an ordinal multinomial response variable are 
as follows: 


Distributions: Ушу, Уу» yə;lb; ~ Multinomial(N;;, лор 71), Лоу), Where You, Уц and уу 
are the observed frequencies of the responses (paw injury) in each category (none, 
mild, and severe) and b; is the random effect due to block assuming 
b; ~ N (0, o2). 

Linear predictor: "суу = Ne + 7; + bj, where nioi is cth link (c = 0, 1) for processing 
i and block j, Ис is the intercept for the cth link, т; is the fixed effect due to the ith 
treatment, and b; is the random effect due to the jth block (b; ~ № (0, |) The 
link functions for each category are as follows: 
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The following GLIMMIX commands fit a cumulative logit model with an ordinal 
multinomial response. 


proc glimmix data=multinomial_ord; 

class block trt; 

model categoria (order=data)= trt/dist=Multinomial link=clogit 
solution oddsratio (DIFF=LAST LABEL); 
random intercept /subject=block; 

estimate 'c=0, t=1' intercept 1 0 trt 10, 
'c=1, Е-1! intercept 01trti10, 

'c=0, t=2' intercept 1 0 trt 010, 

'c=1, Е--2! intercept 0 1 ЕГЕ 010, 

'c=0, Е-3! intercept 1 0 ЕгЕ 00010, 

Іс--1, Е-3! intercept 01 ЕГЕ 00010, 

'c=0, Е-4! intercept 01 ЕГЕ 00001, 

'c=1, t=4' intercept 1 0 trt 0 0 0 0 1/ilink; 
freq y; 

run; 


The data should have one column for block, treatment, lesion category, and 
frequency or number of observations (Y), which, in this case, is referenced by the 
variables block, trt, category, and frequency, respectively. 

Most of the options in the above syntax have already been explained previously; 
the “order = data” option specifies that the order in which the categories appear in 
the dataset will be treated as ordinal categories from the lowest to the highest for the 
analysis. If this option is not used with the response variable in the model specifi- 
cation, “ргос GLIMMIX” will rearrange its categories іп an alphabetical or numer- 
ical order, but this will depend on whether the categories are entered as a number or a 
name. The “freq y option orders GLIMMIX to use y as the number of observations 
in the corresponding category. The “estimate” command specifies the estimable 
functions that form the boundaries between categories of each of the four treatments. 
For example, the first estimate “c = 0, t = 1” defines mo + ти, that is, the boundary 
between the categories “Without” (no lesion) and “Moderate” (slight lesion) for 


treatment 1. This first estimate corresponds to logit log (=), which is the 


probability that a chicken that received treatment 1 will respond to a degree of lesion 
classified under category 0 (no lesion). The second estimation “с = 1, t = 1" defines 
п + ті, that is, the boundary between the categories “Moderate” (slight lesion) and 
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Table 8.9 Results of the analysis of variance in the multinomial cumulative logit model 


(a) Type III tests of fixed effects 


Effect Num DF Den DF F-value Pr > F 
Trt 3 794 22.45 <0.0001 
(b) Solutions for fixed effects 

Effect Categoría Trt Estimate Standard error 
Intercept (Ñ) Without 0.6144 0.1799 
Intercept (575) Moderate 3.8787 0.2465 

Trt (71) 1 - 1.5034 0.2086 

Trt (25) 2 —0.2509 0.2055 

Trt (73) 3 —1.0365 0.2036 

Trt (74) 4 0 


"Severe" (severe lesion) for treatment 1 and corresponds to logit log ( a) , and so 


on. By taking the inverse of these links values, we can obtain the estimated 
probabilities of zo, and z. Part of the Statistical Analysis Software (SAS) glimmix 
output is presented below: 

The results of the analysis of variance in part (a) of Table 8.9 indicate that the 
degree of lesion in the chicken footpad (pododermatitis) in the treatments tested were 
significantly different (P < 0.0001). Therefore, the hypothesis of proportional odds 
of treatments is rejected (Но : т; = 0 for all i, that is, oddsratio = 1). 

In part (b) of Table 8.9, we can see that the estimated intercepts 7, =0.6144 and 
Т =3.8787 define the boundary between the categories "Without" lesion 
and “Moderate” lesion and the boundary between the categories “Moderate” lesion 
and “Severe” lesion, respectively. The estimated effect of the treatments (7;) shows 
that the boundaries move either upward or downward when a certain treatment is 
applied. In this sense, all estimated treatment coefficients have a negative effect with 
respect to treatment 4. This means that chickens under treatments 1-3 have a low 
probability of developing a moderate lesion and a higher probability of developing a 
severe lesion than when treatment 4 is applied. 

To calculate the probability that a chicken will not develop footpad dermatitis 
(c = 0) when receiving treatment 1, that is, “c = 0, Trt = 1,” we first estimate the 
linear predictor ө =o + Պլ = 0.6144 + (— 1.5034) = — 0.889, and, taking the 
inverse, we obtain 71 = 1/4, ,- cos) = 0.29. This value is the estimated probability 
that a chicken will not develop footpad dermatitis when receiving treatment 1. How- 
ever, now, for “с = 1, Trt = 1,” g = +7) =3.8787 + (— 1.5034) = 2.3753, 
whose inverse value is 0.915. This value is an estimate of the probability 79; + 711. 
From this value, we obtain the probability that a chicken will develop a moderate 
lesion and a severe lesion. For a moderate lesion, the probability is т = 
0.915 — 701 = 0.915 — 0.29 = 0.624, and, for a severe lesion, the probability is 
тоу = 1 — 0.915 = 0.085. In a similar way the probabilities for the categories 
(c — 0,1,2) of the rest of the treatments are computed. 
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Table 8.10 Estimated odds ratio 


Odds ratio estimates 


Comparison Estimate DF 95% Confidence limits 

trt 1 vs. 4 0.222 794 0.148 0.335 
trt 2 vs. 4 0.778 794 0.520 1.165 
trt 3 vs. 4 0.355 794 0.238 0.529 


Table 8.11 Estimates on the model scale (Estimate) and on the data scale (Mean) for footpad 
dermatitis categories in the multinomial cumulative logit model 


Estimates 

Standard Standard error 
Label Estimate | error DF |rvalue | Pr > ІА Mean | теап 
с = 0), —0.8893 0.2428 794 | —4.93 | <0.0001 | 0.2914 | 0.001174 
t=1 
c= 1, 2.3753 | 0.2214 794 | 10.73 | <0.0001 | 0.9149 | 0.01724 
t=1 
c= 0, 0.3634 | 0.1757 794 2.07 0.0390 | 0.5899 | 0.04252 
pos 
c= 1, 3.6277 | 0.2420 794 | 14.99 | <0.0001 |0.9741 | 0.006103 
1-2 
с-0, —0.4222 | 0.1740 794 | —2.43 0.0155 | 0.3960 | 0.04162 
t=3 
c= 1, 2.8422 | 0.2304 794 | 12.34 | «0.0001 | 0.9449 | 0.01199 
t=3 
c= 0, 3.8787 | 0.2465 794 | 15.73 | «0.0001 | 0.9797 | 0.004893 
t= 4 
e= 1, 0.6144 | 0.1799 794 3.41 0.0007 | 0.6489 | 0.04098 
t=4 


The odds ratios tabulated in Table 8.10 are the odds ratios for treatments | through 


4, 1.е., e for treatments 1-4. These аге the estimated odds ratios of adjacent 
categories of treatments i (i = 1,2,3) relative to treatment 4. Values of т; are not 
category-specific; the odds ratios for “Without” lesion versus “Moderate” lesion and 
those for “Moderate” lesion versus “Severe” lesion are listed below (hence the name 
“proportional odds”). 

From the above odds ratio results, it should be obvious why the F- and P-values 
in the fixed effects tests are what they are. Adding the “ilink” option to the end of the 
"estimate" command prompts GLIMMIX to estimate the inverse of the linear pre- 
dictors (7,;), i.e., the probabilities per category fei = Им (Table 8.11). 

In the above table, several estimates are shown for 7. + 7;. For example, the 
probability that a chicken will not develop a lesion under treatment 1 can be 
represented by “c = 0, t = 1,” that is, 7.+7, = —0.8893. This result matches 
the one obtained from the fixed effects table "Solutions for fixed effects" 
previously shown. Taking the inverse of the link yields the probability 
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Fig. 8.1 Estimated probabilities for the footpad lesion categories in the treatments tested, using the 
cumulative logit model 


Tor = 1/(1 + e°88°3) = 0.2914. This probability is the maximum likelihood estimate 
that a chicken will have no footpad lesion with treatment 1. The inverse of the link 
function is under the “Mean” column of Table 8.11. Now, for the category “c = 1, 
t= 1,” the inverse of the linear predictor is 0.9149, this is the estimate of foi + 711. 
From this value, we can obtain the probability of a chicken showing a “Moderate” 
lesion when receiving treatment 1, that is, Zo; + 711 = 0.9149, and, substituting the 
value of 701, we obtain the value Z ուլ = 0.9141 — 0.2914 = 0.6227. Finally, for a 
“Severe” lesion (category c = 2, t = 1 ), the probability that a chicken will present a 
severe lesion is 72у = 1 — 0.9141 = 0.0859. Following the same procedure, we can 
obtain the probabilities for each of the following categories (с = 0, 1, 2) of the rest of 
the treatments (2-4). 

Figure 8.1 shows that under the traditional feeding program with a litter density 
of 1 kg m of rice husks (Trtl), there is a high probability that broilers will 
develop moderate and severe footpad lesions, as shown by 71 = 0.624 and 
2 = 0.085, respectively. When the litter density was increased from 1 to 2 kg m 2 
of rice husks under the traditional broiler program (Trt2), the probability of the risk of 
developing moderate and severe footpad lesions in broilers decreased significantly to 
712 = 0.384 and 7շշ = 0.026, respectively, compared to Тит, whereas the probability 
of not developing a footpad lesion increased to 202 = 0.590 (Trt2) compared to 
701 = 0.291 (Trt1). Regarding the implementation of the two foot care programs 
plus the litter density of 2 kg husk m > of rice husks, the probability of chickens of 
not developing a footpad lesion is тол = 0.649 (Trt4) compared to лоз = 0.396 іп 
Trt3, whereas the probability of chickens developing moderate and severe lesions 
decreased from լյ = 0.331 and 754 = 0.025 in Trt4 compared to 713 = 0.549 and 
723 = 0.055 in Trt3. 
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An ordinal cumulative probit model, first considered by Aitchison and Silvey 
(1957), generalizes a binary probit model to ordinal responses. This model results 
from the probit modeling of the cumulative probabilities as a linear function of the 
covariates. The link functions for the cumulative probit model with C categories are 
listed below: 


т-Ф (zu) =m + Xf + Zb 
m = (т + 22) =n + XB + Zb 


վո ւ-Փ (mm + a6) 9e a + ХВ + 2Ь 


where X and Z аге the design matrices, f and b are the vectors of fixed and random 
effects parameters, respectively, and Ф!() is the inverse function of the standard 
normal cumulative distribution. The inverse link of each of the link functions is as 
follows: 


m +m + Էտ = (n. 1) = (И, լ). 


Once hA), հնք), ... Ae լ) are estimated, we can estimate ոլ, ... , тс. The 
quality of the estimates of the ordinal cumulative probit model are usually very 
similar to those of an ordinal cumulative logit model for some datasets but not all. 
Both involve stochastic ordering at different levels of the response variable and are 
designed to detect the location of changes in the response variable. 

Returning to Example 8.3.1, for the cumulative probit model, we change the 
"LINK = CPROBIT" option in the model’s definition of the above program syntax. 
The output will contain all the same elements, except the odds ratios. The analysis 
for the cumulative probit is exactly the same as that one we performed in the 
cumulative logit model. Part of the output is shown in parts (а)-(с) of Table 8.12. 

The estimated variance component due to blocks is e sd: — 0.0092. The results of 
the analysis of variance showed that the degrees of lesion in the chickens' footpad 
(pododermatitis) in the tested treatments differ significantly (P < 0.0001). 

In part (b) of Table 8.12, it is possible to observe that the estimated intercepts 
jj, = 0.3880 and 9; = 2.2407 define the boundary between the “Without” lesion and 
"Moderate" lesion categories and the boundary between the “Moderate” lesion and 
“Severe” lesion categories, respectively. The estimated effect of the treatments (7;) 
moves the boundaries either upward or downward, when a certain treatment is 
applied. In this sense, all estimated treatment coefficients have a negative effect 
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Table 8.12 Results of the analysis of variance in the multinomial cumulative probit model 


(a) Covariance parameter estimates 


Cov Parm Subject Estimate Standard error 
Intercept Blk 0.009262 0.01817 
(b) Type Ш tests of fixed effects 
Effect Num DF Den DF F-value Pr > F 
Trt 3 794 24.57 <0.0001 
(c) Solutions for fixed effects 
Effect Categoría Trt Estimate Standard error 
Intercept (7) ) Without 0.3880 0.1124 
Intercept (75) Moderate 2.2407 0.1375 
Trt (71) 1 —0.9278 0.1227 

Trt (72) 2 —0.1595 0.1242 
Trt (73) 3 —0.6459 0.1219 

Trt (74) 4 0 


with respect to treatment 4. This means that chickens under treatments 1—3 have а 
low probability of developing a footpad lesion and a higher probability of develop- 
ing a severe lesion with respect to treatment 4. 

From “Type Ш tests of fixed effects” (Table 8.12, part (b)), the probabilities for 
each of the categories can be obtained. For the probability that a chicken will not 
develop a footpad lesion (с = 0) under treatment 1, i.e., с = 0, Trt = 1, ” the estimated 
linear predictor is obtained as 779; = о + Tı = 0.3880 + (— 0.9278) = — 0.5398 and, 
taking the inverse, gives Z9 = Ф(- 0.5398) = 0.2946, that is, the estimated probabil- 
ity that a chicken will not develop a footpad lesion when receiving treatment 1. For 

c=1,Trt=1, = + Я =2.2407 + (— 0.9278) = 1.3129, whose inverse 

value is 0.9054. This value is an estimator of zo + Хи. From this value, we can 
obtain the probability that a chicken will develop a moderate lesion and a severe 
lesion. For a moderate lesion, 711 = 0.9054 — то = 0.9054 — 0.2946 = 0.6108, and, 
for a severe lesion, 221 = 1 — 0.9054 = 0.0946. Similarly, we can obtain the proba- 
bilities of the categories for the other treatments (c = 0,1,2) for the rest of the 
treatments. 

Similar to the previous example, adding the “ILINK” option to the end of the 
"ESTIMATE" command prompts GLIMMIX to estimate the values of the linear 
predictors (7,;) and the inverse of the linear predictors, which are the probabilities 
per category (Tei = ®(7,;)). Table 8.13 shows the estimates of the linear predictors as 
well as their inverse values (probabilities in this case). 

From the above table, we show the estimates of 7j. + 7;. For example, the estimated 
linear predictor that a chicken will not develop a footpad lesion under treatment 1, i.e., 
“с = 0,1-1, is calculated as п. +7, = — 0.5398. This result matches the values 
obtained from the fixed effects table ("Solutions for fixed effects") previously shown. 
Taking the inverse of the link function, 701 = Ф(0.5398) = 0.2947. This is the 
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Table 8.13 Estimates on the model scale (Estimate) and on the data scale (Mean) for footpad 
lesion categories in the multinomial cumulative probit model 


Estimates 

Standard t- Standard error 
Label Estimate | error DF | value | Pr> Id Mean | mean 
c=0, -0.5398 | 0.1100 794 | —4.91 | <0.0001 | 0.2947 | 0.03793 
t=1 
c=1, 1.3129 | 0.1208 794 | 10.87 | «0.0001 | 0.9054 | 0.02035 
t=1 
c= 0, 0.2285 |0.1105 794 2.07 0.0389 0.5904 | 0.04293 
1-2 
c—l, 2.0812 | 0.1345 794 | 15.47 | «0.0001 | 0.9813 | 0.006153 
r= 2 
c =: 0, —0.2578 | 0.1085 794 | —2.38 0.0178 | 0.3983 | 0.04189 
t=3 
c=1, 1.5949 | 0.1258 794 | 12.68 | «0.0001 | 0.9446 | 0.01407 
t=3 
с-0, 2.2407 | 0.1375 794 | 16.29 | «0.0001 | 0.9875 | 0.004457 
t=4 
c= 1, 0.3880 | 0.1124 794 3.45 0.0006 | 0.6510 | 0.04158 
t=4 


probability that a chicken will not develop a footpad lesion when receiving treatment 
1. This probability is under the “Mean” column. 

Now, for the category c = 1, t = 1, the inverse of the link function is a probability 
of 0.9054, which results from the inverse value of the linear predictor 7 + 7; = 1.3129. 
This value is the estimate in terms of probability то + 711. From this value, we can 
obtain the probability that a chicken presents a "Moderate" lesion when receiving 
treatment 1, that is, Zoj + Z1; = 0.9054, and, using the value of 701, we obtain the 
values 711 = 0.9054 — 0.2947 = 0.6107 and տշլ = 1 — 0.9054 = 0.0946. Following 
the same procedure, we can obtain the rest of the probabilities for each one of the 
categories (c = 0, 1, 2) and for the rest of the treatments (2—4). 


8.5 Effect of Judges’ Experience on Canned Bean Quality 
Ratings 


Canning quality is one of the most essential traits required in all new dry bean 
(Phaseolus vulgaris L.) varieties, and the selection for this trait is a critical part of 
bean breeding programs. Advanced lines that are candidates for release as varieties 
must be evaluated for canning quality for at least 3 years from samples grown at 
different locations. Quality is evaluated by a panel of judges with varying levels of 
experience in evaluating breeding lines for visual quality traits. A total of 264 bean 
breeding lines from 4 commercial classes were retained according to the procedures 
described by Walters et al. (1997). These included 62 white (navy), 65 black, 
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Table 8.14 Frequency of ratings of different types of beans as a function of the bean-rating 


experience 

Black Kidney Navy Pinto 

< > < > < > < > 
Calif |5 Years |5 Years |5 Years |5 Years |5 Years |5 Years |5 Years |5 Years 
1 13 32 7 10 10 22 13 2 
2 91 78 32 31 56 51 29 17 
3 123 124 136 96 84 107 91 68 
4 72 122 101 104 84 98 109 124 
5 24 31 47 71 51 52 60 109 
6 2 3 6 18 24 37 25 78 
7 0 0 1 0 1 5 1 12 


55 kidney, and 82 pinto bean lines plus control or “check” lines. The visual 
appearance of the processed beans was determined subjectively by a panel of 
13 judges on a 7-point hedonic scale (1 = very undesirable, ..., 4 = neither desirable 
nor undesirable,..., 7 = very desirable). Beans were presented to the panel of judges 
in random order at the same time. Before evaluating the samples, all judges were 
shown examples of samples rated as satisfactory. 

There is concern that certain judges, due to lack of experience, may not be able to 
correctly score the canned samples. From attribute-based product evaluations, infer- 
ences about the effects of experience can be drawn from the psychology literature 
(Wallsten and Budescu 1981). Prior to the bean canning quality rating experiment, it 
was postulated that not only do less experienced judges have a more severe rating 
than do more experienced judges but also that experience should have little or no 
effect on white beans, for which the canning procedure was developed. Judges are 
stratified for the purpose of analysis by experience (less than 5 years, greater than 
5 years). Counts by canning quality, judge experience, and bean breeding lines are 
listed in the following table (Table 8.14). 

The link functions for the cumulative logit model for describing a variable with 
C categories are as follows: 


= Л Z | 
naj = log 1 ту —Hn о; 
ij 
7iij + Tij 


ў = lo =m + ai + В; 4 
N(2)ij ն ա m 15 а, р; 


= lo 
1Շ-1 ն 


The components of the GLMM with an ordinal multinomial response аге as 
follows: 


В; + (ap); 
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Distributions: Уш» Уз Уз Уа» Уз» Yeij Улу Multinomial 
(Nij, лү Tij, Tip Ла 751» Ti Лу), Where yii yə;, Узі Уау» Уз» You, and уту are 
Ше observed frequencies of Ше responses in each category c of Ше hedonic scale 
(1 = very undesirable, ..., 4 = neither desirable nor undesirable, ..., 7 = very 
desirable). 

Linear predictor: ису = Ne + G; + p; «(а );, where "суу 18 the cth link (c = 1, 2,...,6) 
for bean type i and judge's experience J; ye is the intercept for the cth link; а; is the 
fixed effect due to the bean type for ith bean class; f; is the fixed effect due to the 
jth experience of the judge; and (af); is the fixed effect due to the interaction 
between bean class and judge experience. The link functions for each category are 
as follows: 


лі + Tij 


РЕР т Еке. 
ENI (mu + mai) 


= Пу 


ույ + Лоу + ij u 
log = Mij 
1- (ույ + Mij + лз) 


Ті) + Tij + Лу + Лау u 
log = Nai 
1— (ті + Tij + лязу + л) 


Лү + Tij + Лу + Ла) + Asi _ 
log = 15; 
1— (ույ + Tij + Tai; + лау + Tsij) 


Tij + Mij + 73у + Ла) + Wij + 7t6ij л 
log = "6 
1- (ույ + лу + лзу + лу + Zsij + төй) 


The following GLIMMIX commands fit a cumulative logit model with ап ordinal 
multinomial response. 


proc glimmix data=beans ; 
class Exper; 
model cal (order=data)= Exper |Class/dist=Multinomial link=clogit 
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solution oddsratio; 

Contrast 'Effect of Experience on Black bean' exper 1 -1class*exper 1-1 
0000000000; 

Contrast 'Effect of Experience on Kidney Bean' exper 1 -1class*exper 0 0 
1-100000000; 

Contrast 'Effect of Experience on Navies bean ' exper 1 -1 class*exper 0 0 
000001-1000; 

Contrast 'Effect of Experience on Pinto beans ' exper 1 -1class*exper 0 0 
0000000001 -1; 

estimate 'Black, < 5 year, Rating=1' Intercept 10 000 00 0 0 class 1 0 
0 0 0 0 0exper 1 0 clases*exper 10 00 0 0 00 0 0 0/ilink; 

estimate 'Black, < 5 year, Rating <= 2! Intercept 010 00 0 0 0 0 class 1 0 
00000 exper 1 0 0 class*exper1 00 0 00 0 0 0 0 0/ilink; 

estimate 'Black, < 5 year, Rating <= 3! Intercept 0001 00 00 0 class 1 0 
00000 exper100class*exper1000000000 0/ilink; 

estimate 'Black, < 5 year, Rating<=4'Intercept00001000class1000 
0 0 0 exper 1 0 0 class*exper 10 000 0 0 0 0 0 0/ilink; 

estimate 'Black, < 5 year, Rating <= 5! Intercept 000 00 01 0 0class 10 
0000 0exper 1 0 0 class*exper 10 0 00 0 0 0 0 0 0/ilink; 

estimate 'Black, > 5 year, Rating <= 6! Intercept 00 00 00 001 class 1 0 
0 0 0 0 exper 1 0 class*exper 10 000 0 0 00 0 0/ilink; 

estimate 'Black, > 5 year, Rating= 1! Intercept 10 0 00 0 0 0 0 class 1 0 
0 0 0 0exper 0 1 class*exper 010 0 0 0 0 0 0 0/ilink; 

estimate 'Black, > 5 year, Rating <= 2! Intercept 010 00 0 00 0 class 1 0 
0 0 0 0 0exper 01 class*exper 01 0 0 0 0 0 0 0 0/ilink; 

estimate 'Black, > 5 year, Rating <= 3! Intercept000100000class10 
00000 exper 0 1 с1аз5*ехрег 01 0 0 0 0 0 0 0 0/ilink; 

estimate 'Black, > 5 year, Rating <= 4! Intercept 00001000class1000 
0 0 exper 0 1 class*exper01000000000/ilink; 

estimate 'Black, > 5 year, Rating <= 5! Intercept000000100class10 
00000 exper 0 1 class*exper 01000000000/111пЕ; 

estimate 'Black, > 5 year, Rating <= 6! Intercept 00 00 00 0 01 class 1 0 
0 0 0 0 exper 01 clases*exper 010 0 0 00 0 0 0 0/ilink; 

estimate 'Kidney, < 5 year, Rating= 1! Intercept 1000 00 00 0 class 01 
0 0 0 0 exper 1 0 0 class*exper 0 001 00 0 0 0 0 0/ilink; 

estimate 'Kidney, < 5 year, Rating <= 2! Intercept 010 00 0 0 0 0 class 01 
0000 exper100class*exper0001000000 0/ilink; 

estimate 'Kidney, < 5 yr, Rating <= 3! Intercept000100000class01 
0 0 0 exper 1 0 0 class*exper 00010 0 0 0 0 0 0/ilink; 

estimate 'Kidney, < 5 year, Rating <= 4! Intercept 0000 01 0 0 0 class 01 
0 0 0 exper 1 0 0 сіавв“ехрек 0001 0 00 0 0 0 0/ilink; 

estimate 'Kidney, < 5 year, Rating <= Б! Intercept 00 00 001 0 0 class 01 
0 0 0 exper 1 0 0 сіавв“ехрек 00 01 0 00 0 0 0 0/ilink; 

estimate 'Kidney, «5 year, Rating <= 6' Intercept 000 00 0 0 01 class 01 
0 0 0 0 exper 1 0 0 clases*exper 00 01 0 0 0 0 0 0/ilink; 

estimate 'Kidney, > 5 year, Rating= 1! Intercept 100 00 00 0 0 class 01 
0 0 0 0 exper 0 1 class*exper 000 01 0 0 0 0 0 0/ilink; 

estimate 'Kidney, > 5 year, Rating <= 2! Intercept 010 00 000 0 class 01 
0 0 0 0 exper 0 1 clases*experx 00 00 1 0 0 0 0 0 0/ilink; 

estimate 'Kidney, > 5 year, Rating <= 3! Intercept000100000class01 
0 0 0 0 exper 01class*exper00001000000/ilink; 

estimate 'Kidney, > 5 year, Rating <= 4! Intercept000001000class01 
0 0 0 0 exper 0 1 class*exper 00 0 01 0 0 0 0 0 0/ilink; 
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estimate 'Kidney, > 5 year, Rating <= 5! Intercept 000 0001 00class 01 
0 0 0 0 exper 0 1 с1авв*ехрег0 00 01 0 0 0 0 0 0/ilink; 

estimate 'Kidney, > 5 year, Rating <= 6' Intercept 000 00 0 0 01 class 01 
0 0 0 0 exper 0 1 class*exper 00 0 01 0 0 0 0 0 0/ilink; 

estimate 'Navies, < 5 year, Rating= 1! Intercept 100 00 0 000 class 0 0 
0100 exper 1 0 0 clases*exper 00 00 01 0 0 0 0 0/ilink; 

estimate 'Navies, <5 year, Qualification <= 2! Intercept 01000000 

0 class 0 0 01 0 0 exper 1 0 0 class*exper 00 00 01 0 0 0 0 0/ilink; 
estimate 'Navies, < 5 year, Qualification <= 3! Intercept 0000100000 
class 0 0 01 0 0 exper 1 0 0 class*exper 0 00 0 010 0 0 0 0/ilink; 
estimate 'Navies, < 5 year, Rating <= 4! Intercept 000 0010 0 0class 0 0 
0100 exper 1 0 0 class*exper 00 00 01 0 0 0 0 0/ilink; 

estimate 'Navies, < 5 year, Rating <= 5! Intercept 000 00 0 01 0 0 class 
0 00100 exper 1 0 0 clases*exper 0 0 00 01 0 0 0 0 0/ilink; 

estimate 'Navies, <5 year, Qualification <= 6! Intercept 0000000001 
class 0 0 01 0 0 exper 1 0 0 class*exper 0 000 01 0 0 0 0 0/ilink; 
estimate 'Navies, > 5 year, Qualification= 1! Intercept 10000000 

0 class 0 0 01 0 0 exper 0 1 class*exper 0 00 0 0 0 01 0 0 0/ilink; 
estimate 'Navies, > 5 year, Qualification <= 2! Intercept 01000000 

0 class 0 0 01 0 0 exper 0 1 class*exper0 0 0 0 0 0 01 0 0 0/ilink; 
estimate 'Navies, > 5 year, Rating <= 3! Intercept 000 100000 class 0 0 
0 1 0 0 exper 01class*exper00000001000/ilink; 

estimate 'Navies, > 5 year, Rating <= 4! Intercept 000 001 000class 00 
0 1 0 0 exper 0 1 clases*exper 00 00 0 00 1 0 0 0/ilink; 

estimate 'Navies, > 5 year, Rating <= 5! Intercept 00 0 00 0 01 0 0 class 
000100ехрек 01 class*exper 0 00 0 0 0 01 0 0/ilink; 

estimate 'Navies, > 5 year, Rating <= 6! Intercept 000 00 0 0 0 01 class 
000100 exper 0 1 class*exper 0 00 0 00 01 0 0 0/ilink; 

estimate 'Pinto, < 5 year, Qualification = 1' Intercept 10000000 
O0Oclass0000001experi0class*exper00000000100/ilink; 
estimate 'Pinto, < 5 year, Qualification <= 2! Intercept01000000 

0 class 0 0 0 0001 exper 1 0 0 с1авв%ехрек 0 000000010 0/ilink; 
estimate 'Pinto, <5 year, Qualification <= 3! Intercept 0000100000 
class 0 0 0 0 0 1 exper 1 0 0 с1азз*ехрек 0000 0 00 01 0 0/ilink; 
estimate 'Pinto, < 5 year, Rating <= 4! Intercept 00 00 01 0 0 0 class 0 0 
0 0 0 1 exper 1 0 0 class*exper 0 000 0 0 0 01 0 0/ilink; 

estimate 'Pinto, «5 year, Rating <= 5! Intercept 000 00 00100class 00 
0 0 0 1 exper 1 0 class*exper0 0 00 0 0 0 01 0 0/ilink; 

estimate 'Pinto, < 5 year, Rating <= 6! Intercept0000000001class00 
0001 exper 1 0 сіавв“ехрек 0 00 00 00 01 0 0/ilink; 

estimate 'Pinto, > 5 years, Qualification = 1' Intercept 10000000 

0 class 0 0 0 0 01 exper 0 1 class*exper 0 0 00 00 0 0 0 1/ilink; 

estimate 'Pinto, > 5 year, Qualification <= 2! Intercept 01000000 

0 class 0 0 0 0 01 exper 0 1 clases*exper 0 00 00 0 0 0 0 0 1/ilink; 
estimate 'Pinto, > 5 year, Qualification <= 3! Intercept 00010000 

0 class 0 0 0 0 01 exper 0 1 class*exper 0 0 0 0 00 0 0 0 0 1/ilink; 
estimate 'Pinto, > 5 year, Rating <= 4! Intercept 00 00 01 0 0 0 class 0 0 
0 0 0 1 exper 0 1 class*exper0 00 00 00 0 0 0 1/ilink; 

estimate 'Pinto, > 5 year, Rating <= 5! Intercept 000 00 0010 0class 00 
0001 exper 0 1 class*exper 00 0 00 0 0 0 0 0 1/ilink; 

estimate 'Pinto, > 5 year, Qualification <= 6! Intercept 0000000001 
class 0 0 0 0 0 1 exper 0 1 clases*exper 00 00 0 0 00 0 1/ilink; 

freqy; 

run; 
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Table 8.15 Е ixed effects Type III tests of fixed effects 

hypothesis esang im the pul Effect Num РЕ |DenDF F-value |Pr>F 

tinomial cumulative logit <<< 

model Class 1 2779 36.19 <0.0001 
Exper 3 2779 85.20 <0.0001 
Class*Exper |3 2779 10.13 <0.0001 


Table 8.16 Hypothesis testing in quality assessment 


Contrasts 

Label Num DF Den DF F-value Pr> F 
Effect of experience on black beans 1 2779 2.77 0.0961 
Effect of ехрегіепсе on kidney beans 1 2779 7.86 0.0051 
Effect of ехрепепсе on navy beans 1 2779 0.02 0.8822 
Effect of experience on pinto beans 1 2779 58.06 <0.0001 


Part of the results is shown below. The results of the analysis of уапапсе show 
that the class of bean (Class), experience of the evaluator (Exper), and the interaction 
between class and experience (ClassxExper) on bean canning scores differ signifi- 
cantly (P — 0.0001). That is, the results of comparing judges with more and less 
years of experience will depend on the line (variety) of beans (Table 8.15). 

The contrasts address this interaction (Table 8.16). Hypothesis testing is as 
follows: Telass of bean, < 5 years of experience = Лс]аѕѕ of bean, > 5 years of experience. 

The results show that judges with more than 5 years of experience differ from 
those with less than 5 years of experience in evaluating the quality of canned kidney 
and pinto beans (Table 8.16). With the "solution" option in the model specification, 
the fixed parameter estimates table shows the solution of the fixed effects parameters 
under maximum likelihood. In this table, we can observe the values of the estimated 
intercepts: 7, = — 4.6421 defines the boundary between the categories, “1 = highly 
undesirable" and “2 = moderately undesirable", whereas 77, = — 2.9316 defines the 
boundary between the categories “2 = moderately undesirable" and “3 = slightly 
undesirable." The third intercept defines the boundary between the categories 
“3 = moderately undesirable" and “3 = slightly undesirable," 73 = — 1.3995 defines 
the boundary between the categories “3 = slightly undesirable" and “4 = neither 
undesirable nor desirable," and so on. 


The estimated effects of bean type (à), evaluator (0) , and their interaction (ой) аге 
shown below. From these values, we can estimate the linear predictors for each of the 
categories. For example, the linear predictor for canned black beans evaluated by an inexpe- 
rienced judge who assigns the category “1 = very undesirable" is ў =7 + @ + А, + 
af, = — 4.6421 + 1.9670 + 1.0284 — 0.8066 = — 2.4533, for category “2 = moder- 
ately “undesirable” it в fu =ù +@ +Ñ, + ац = - 2.9316 + 1.9670 
1.0284 — 0.8066 = — 0.7428, for category “3 = slightly undesirable” it is 
Han = з + а! +Ë, + ай, = — 1.3995 + 1.9670 + 1.0284 — 0.8066 = 0.7893, апа, 
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Table 8.17 Maximum likelihood estimation of the estimated parameters in the fixed effects 
solution of canned bean quality ratings in the multinomial cumulative logit model 


Fixed parameter estimates 


Са! | Class Expert Standard 
Effect 7; а; fi Estimate | error DF |t-value |Pr > Id 
Intercept 7, | 1 —4.6421 | 0.1363 2779 | —34.05 | <0.0001 
Intercept |» |2 —2.9316 |0.1057 2779 | —27.74 | «0.0001 
Intercept 7, |3 —1.3995 | 0.09643 2779 |-14.51 | ՀՕ.0001 
Intercept (д |4 0.004287 | 0.09230 2779 0.05 | 0.9630 
Intercept (5 |5 1.4191 0.1026 2779 13.84 | «0.0001 
Intercept fe |6 3.8925 0.2346 2779 16.59 | «0.0001 
Class Black 1.9670 0.1318 2779 1493 | <0.0001 
2 
Class Kidney 1.0472 0.1342 2779 7.80 | <0.0001 
а; 
Class Navy 1.3076 0.1345 2779 9.72 |<0.0001 
аз 
Class Pinto 0 
[7 
Exper 1 f 1.0284 0.1350 2779 7.62 | <0.0001 
Ехрег 2 0 4 . . š 
Class*Exper Black 1 ай, —0.8066 | 0.1894 2779 | —4.26 | «0.0001 
Class*Exper Black 2 0 š : : : 
Class*Exper Kidney |լ aj; —0.6457 | 0.1912 2779 | —3.38 | 0.0007 
Class*Exper Kidney | 2 0 s š . ) 
Class*Exper Navy 1 ай» —1.0072 | 0.1969 2779 | —5.12 | «0.0001 
Class*Exper Navy 2 0 
Class*Exper Pinto 1 ай, 0 
Class*Exper Pinto 2 0 
for category “4 = neither undesirable пог desirable" it в 


йл = + @ +B, + af, = 0.004287 + 1.9670 + 1.0284 — 0.8066 = 2.1931. This 
is how the other categories are calculated for each type of bean and assessor (Table 8.17). 
The results of Table 8.18 were obtained with the “estimate” command in con- 
junction with the “ilink” option that prompts GLIMMIX to compute the values of the 
linear predictors, դոյ» tabulated under the “Estimate” column, and Ше estimated 
probabilities Лу for all categories of each treatment are tabulated under the “Mean” 
column (£j), except the reference category. 
From Table 8.18 ("Estimates"), we can obtain the probabilities reported under the 
“Mean” column in which an inexperienced («5 years) panelist (judge) would rate 
canned black beans as category 1 (1 — highly undesirable) with a probability of 
Тіп = 0.08 compared to an experienced panelist (>5 years) who would give a 
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Table 8.18 Estimates on the model scale (Estimate) and on the data scale (Mean) based on judges’ 


experience in canned bean quality ratings in the multinomial cumulative logit model 


Estimates 


Label 
Black 
<5 years, 
score = 1 
Black 
<5 years, 
score < 2 


Estimate 
—2.4533 


—0.7428 


Standard 
error 


0.1292 


0.1004 


DF 
2779 


2779 


t-value 
—18.99 


—7.40 


Pr > Ifl 
<0.0001 


<0.0001 


Mean 
0.07920 


0.3224 


Standard 
error 
mean 


0.009419 


0.02194 


Black 
<5 years, 
score < 3 


0.7893 


0.1008 


2779 


7.83 


<0.0001 


0.6877 


0.02164 


Black 
<5 years, 
score < 4 


2.1931 


0.1076 


2779 


20.38 


<0.0001 


0.8996 


0.009716 


Black 
<5 years, 
score < 5 


3.6079 


0.1238 


2779 


29.15 


<0.0001 


0.9736 


0.003180 


Black 
>5 years, 
score < 6 


6.0814 


0.2467 


2779 


24.65 


<0.0001 


0.9977 


0.000561 


Black 
>5 years, 
score = 1 


—2.6751 


0.1264 


2779 


—21.17 


<0.0001 


0.06446 


0.007621 


Black 
>5 years, 
score < 2 


—0.9646 


0.09577 


2779 


—10.07 


<0.0001 


0.2760 


0.01913 


Black 
>5 years, 
score < 3 


0.5675 


0.09314 


2779 


6.09 


<0.0001 


0.6382 


0.02151 


Black 
>5 years, 
score < 4 


1.9713 


0.09967 


2779 


19.78 


<0.0001 


0.8778 


0.01069 


Black 
>5 years, 
score < 5 


3.3861 


0.1170 


2779 


28.95 


<0.0001 


0.9673 


0.003704 


Black 
>5 years, 
score < 6 


5.8595 


0.2434 


2779 


24.07 


<0.0001 


0.9972 


0.000690 


Kidney 
<5 years, 
score = 1 


—3.2122 


0.1333 


2779 


—24.11 


<0.0001 


0.03871 


0.004958 


Kidney 
<5 years, 
score < 2 


—1.5017 


0.1018 


2779 


—14.74 


<0.0001 


0.1822 


0.01517 


(continued) 
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Table 8.18 (continued) 


Estimates 


Label 
Kidney 
<5 years, 
score < 3 


Estimate 
0.03040 


Standard 
error 


0.09608 


DF 
2779 


t-value 
0.32 


Pr > И 
0.7517 


Mean 
0.5076 


Standard 
error 
mean 


0.02401 


Kidney 
<5 years, 
score < 4 


1.4342 


0.1011 


2779 


14.19 


<0.0001 


0.8076 


0.01571 


Kidney 
<5 years, 
score < 5 


2.8490 


0.1178 


2779 


24.18 


<0.0001 


0.9453 


0.006096 


Kidney 
>5 years, 
score < 6 


5.3225 


0.2438 


2779 


21.83 


<0.0001 


0.9951 


0.001179 


Kidney 
>5 years, 
score = 1 


—3.5949 


0.1372 


2779 


—26.20 


<0.0001 


0.02673 


0.003569 


Kidney 
>5 years, 
score < 2 


—1.8844 


0.1071 


2779 


—17.60 


<0.0001 


0.1319 


0.01226 


Kidney 
>5 years, 
score < 3 


—0.3523 


0.09988 


2779 


—3.53 


0.0004 


0.4128 


0.02421 


Kidney 
>5 years, 
score < 4 


1.0515 


0.1020 


2779 


10.31 


<0.0001 


0.7411 


0.01957 


Kidney 
>5 years, 
score < 5 


2.4663 


0.1176 


2779 


20.98 


<0.0001 


0.9217 


0.008480 


Kidney 
>5 years, 
score < 6 


4.9397 


0.2436 


2779 


20.27 


<0.0001 


0.9929 


0.001719 


Navies 
<5 years, 
score = 1 


—3.3133 


0.1404 


2779 


—23.60 


<0.0001 


0.03512 


0.004757 


Navies 
<5 years, 
score < 2 
Navies 
<5 years, 
score < 3 


—1.6027 


—0.07066 


0.1119 


0.1068 


2779 


2779 


—14.33 


—0.66 


<0.0001 


0.5084 


0.1676 


0.4823 


0.01561 


0.02667 


Navies 
<5 years, 
score < 4 


1.3332 


0.1102 


2779 


12.10 


<0.0001 


0.7914 


0.01820 


Navies 
<5 years, 
score < 5 


2.7479 


0.1251 


2779 


21.97 


<0.0001 


0.9398 


0.007077 


(continued) 
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Estimates 


Label 
Navies 
>5 years, 
score < 6 


Estimate 
5.2214 


Standard 
error 


0.2473 


DF 
2779 


t-value 
21.12 


Pr > Ifl 
<0.0001 


Mean 
0.9946 


Standard 
error 
mean 


0.001321 


Navies 
>5 years, 
score = 1 


—3.3345 


0.1348 


2779 


—24.74 


<0.0001 


0.03441 


0.004479 


Navies 
>5 years, 
score < 2 


—1.6240 


0.1047 


2779 


—15.51 


<0.0001 


0.1647 


0.01440 


Navies 
>5 years, 
score < 3 


—0.09190 


0.09897 


2779 


—0.93 


0.3532 


0.4770 


0.02469 


Navies 
>5 years, 
score < 4 


1.3119 


0.1028 


2779 


12.76 


<0.0001 


0.7878 


0.01719 


Navies 
>5 years, 
score < 5 


2.7267 


0.1186 


2779 


22.99 


<0.0001 


0.9386 


0.006836 


Navies 
>5 years, 
score < 6 


5.2002 


0.2439 


2779 


21.32 


<0.0001 


0.9945 


0.001331 


Pinto 
<5 years, 
score = 1 


—3.6137 


0.1380 


2779 


—26.19 


<0.0001 


0.02624 


0.003527 


Pinto 
<5 years, 
score < 2 


—1.9032 


0.1081 


2779 


—17.61 


<0.0001 


0.1297 


0.01221 


Pinto 
<5 years, 
score < 3 


—0.3711 


0.1008 


2779 


—3.68 


0.0002 


0.4083 


0.02436 


Pinto 
<5 years, 
score < 4 


1.0327 


0.1030 


2779 


10.03 


<0.0001 


0.7374 


0.01994 


Pinto 
<5 years, 
score < 5 
Pinto 
>5 years, 
score < 6 


2.4475 


4.9210 


0.1184 


0.2439 


2779 


2779 


20.67 


20.17 


<0.0001 


<0.0001 


0.9204 


0.9928 


0.008678 


0.001753 


Pinto 
>5 years, 
score = 1 


—4.6421 


0.1363 


2779 


—34.05 


<0.0001 


0.009545 


0.001289 


Pinto 
>5 years, 
score < 2 


—2.9316 


0.1057 


2779 


—27.74 


<0.0001 


0.05061 


0.005078 


(continued) 
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Table 8.18 (continued) 


Estimates 
Standard 

Standard error 
Label Estimate error DF t-value |Рг>1й Mean mean 
Pinto —1.3995 0.09643 |2779 | —14.51 | «0.0001 | 0.1979 0.01531 
>5 years, 
score € 3 
Pinto 0.004287 |0.09230 |2779 0.05 0.9630 | 0.5011 0.02307 
>5 years, 
score < 4 
Pinto 1.4191 0.1026 2779 13.84 | «0.0001 | 0.8052 0.01609 
>5 years, 
score <5 
Pinto 3.8925 0.2346 2779 16.59 | <0.0001 | 0.9800 0.004595 
>5 years, 
score < 6 


probability of 7112 = 0.0646. To calculate the probability that a judge with less than 
5 years experience would assign a rating of 2 (2 — moderately undesirable) to canned 
black beans, we derive this probability from the cumulative probability of 0.3224, 
which corresponds to л + 7111. {тот which we get 
T211 = 0.3224 — T111 = 0.3224 — 0.08 = 0.24. On the other hand, for a judge with 
experience (>5 years), the probability of assigning a score of 2 to canned black 
beans is 7212 = 0.2760 — T112 = 0.2760 — 0.06446 = 0.2115. 

Following the same procedure, the other probabilities for the rest of the categories 
are obtained. The probabilities calculated for each of the categories are shown in 
Table 8.19 and can be seen in Fig. 8.2. 


8.6 Generalized Logit Models: Nominal Response Variables 


In a model with unordered data, the polytomous response variable does not have an 
ordered structure. Two classes of models, generalized logit models and conditional 
logit models, can be used with nominal response data. A generalized logit model 
consists of a combination of several binary logits estimated simultaneously. A logit 
model is the simplest and best-known probabilistic choice model. However, there are 
problems in making use of a multinomial logit model because of its inflexibility. A 
generalized logit model is essentially more flexible than the traditional multinomial 
cumulative logit model. 

A generalized logit model shows the same flexibility as a probit model but is 
much more tractable. Like cumulative logit and probit models, a generalized logit 
model has C — 1 link functions, where C denotes the number of response categories. 
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Table 8.19 Probabilities calculated for each of the canned bean grades 


Call Cal2 Cal3 Cal4 Cal5 Cal6 Cal7 

Black Л 0.08 0.24 0.37 0.21 0.07 0.02 0.00 
Ј2 0.06 0.21 0.36 0.24 0.09 0.03 0.00 

Kidney л 0.04 0.14 0.33 0.30 0.14 0.05 0.00 
12 0.03 0.11 0.28 0.33 0.18 0.07 0.01 

Мауу Л 0.04 0.13 0.31 0.31 0.15 0.05 0.01 
Ј2 0.03 0.13 0.31 0.31 0.15 0.06 0.01 

Pinto л 0.03 0.10 0.28 0.33 0.18 0.07 0.01 
12 0.01 0.04 0.15 0.30 0.30 0.17 0.02 


Call = qualification 1, Cal2 = qualification 2,...., Cal7 = qualification 7; 11 = panelist with less 
than 5 years’ experience, and J2 = panelist with more than 5 years’ experience 
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Fig. 8.2 Estimated probabilities for each category of the acceptability of canned beans, according 
to the experience of the panelist (judge) 


Moreover, in this class of models, a category is first defined as the reference 
category. This may be arbitrary or it may make compelling logical sense in the 
study to designate a particular response category as the reference. In practice and 
throughout the analysis, the category used as the reference is irrelevant, as long as we 
are consistent about it. For example, if C is used as the reference category, then the 
generalized logits are defined as shown below: 
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Tij 
N = 108 (=) =a +7 Xf, T շել 


m = log ЕЗ = a7 + Xf; + 262 


TC —1)ij 
Nc-1>= ДЕС -ас-і-Хфс-1-2ӛс-і 


ЛС 


Given the different effects in the models, the intercepts (а s), f s, and b s vary 
across the pairs of response variable categories for each link function. Using algebra, 
it can be shown that the general form of the inverse of the link functions is given by 


MS ео 
Ле = Շ-1 > c=1,4,...,0— 
1+ Уе" 
с= 1 
Once л, лі, .... лс - 1 are estimated, the reference category is estimated as 
C-1 
лс = 1— > Лос. 

e 


8.6.1 CRDs with a Nominal Multinomial Response 


In practice, cumulative models are used for analyzing ordinal data and generalized 
logit models for nominal data. Returning to Example 8.3.1, we will now implement 
the analysis of a generalized logit model. This model relaxes the assumptions of 
proportionality; but it is less parsimonious than the “odds ratio” model since they fit 
C — 1 binary logit models, where C is the number of categories of the response 
variable. The linear predictor and distribution are the same as in the previous 
example. 

The following GLIMMIX syntax implements the analysis of the generalized logit 
model: 


procglimmix data=chickens ; 

class trt block category; 

model category (reference='severe')= trt/dist=Multinomial 
link=glogit oddsratio; 

random intercept/subject-block solution group=category; 
estimate 't=1' intercept 1 trt 10, 

t=2' intercept 1 ЕГЕ 010, 

't=3' intercept 1 ЕГЕ 00010, 

't=4' intercept 1 ЕКЕ 00001/ilink bycat; 

freq у; 

кип; 
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Table 8.20 Analysis of Type III tests of fixed effects 

леш Шш generalized Effect Num DF Den DF F-value Pr> F 

multinomial logit model 0 
Тп 6 790 10.78 <0.0001 


Table 8.21 Maximum likelihood estimates оп the model scale (Estimate) for footpad lesion level 
in the multinomial generalized logit model 


Solutions for fixed effects 


Effect Category Trt | Estimate | Standard error | DF t-value | Pr > Il 
Intercept | Without lesion 4.8525 1.0059 2 4.82 0.0404 
Intercept | Moderate lesion 4.2485 1.0071 2 4.22 0.0519 
trt Without lesion 1 —3.8447 | 1.0330 790 | —3.72 |0.0002 
trt Moderate lesion |1 —2.6478 | 1.0327 790 |—2.56 | 0.0105 
trt Without lesion 2 —1.1888 | 1.1618 790 |—1.02 0.3065 
trt Moderate lesion |2 —0.9651 1.1662 790 |-0.83 0.4082 
trt Without lesion 3 —2.7860 | 1.0585 790 | —2.63 0.0087 
trt Moderate lesion |3 - 1.8326 1.0598 790 |-1Л3 0.0842 
trt Without lesion 4 0 

trt Moderate lesion |4 0 


Most of the syntax of the program has already been explained. The “reference=” 
option is new to this program in the command, where the model is defined and is 
used to designate the reference category. By not specifying the “reference=” option, 
GLIMMIX by default uses the last category in the dataset. Moreover, the 
“link = glogit" option prompts GLIMMIX to fit a generalized logit model. The 
“Буса?” option in the "estimate" command is unique to the generalized logit model. 
Finally, the “ilink” option asks GLIMMIX to estimate all category probabilities for 
each treatment, except those for the reference category. Part of the output is shown in 
Table 8.20. The fixed effects test shows that there are highly significant differences 
(P = 0.0001) on the average percentage of footpad lesion level between treatments. 

Unlike the cumulative logit model, in the generalized logit model, the estimates 
of the fixed effects (treatments), as well as the intercepts, are separated for each 
link function. For the estimation of linear predictors, we use the estimated values 
of Table 8.21 (“Solutions for fixed effects”). The estimated intercepts @ = 4.8525 and 
ճշ = 4.2485 define the boundary between the categories “Without” lesion and “Mod- 
erate” lesion and the boundary between the categories “Moderate” lesion and “Severe” 
lesion, respectively. For treatment 1, the treatment effects (տչ) estimated for the 
“Without” lesion category is 7; = — 3.8447 and for the “Moderate” lesion category, 
it is 7; = — 2.6478. With these values, the linear predictors for the “Without” lesion 
and “Moderate” lesion categories under treatment 1 are fo; = 4.8525 — 3.8447 = 
1.0077 and |, = 4.2485 — 2.6478 = 1.6007, respectively. 

The estimated probabilities for each of the categories (“Without’ lesion 
and “Moderate” lesion) in each treatment, except for the reference category, are found 
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Table 8.22 Estimates on the model scale (“Estimate”) and on the data scale (“Mean”) for footpad 
lesion level observed in treatments in the multinomial generalized logit model 


Estimates 
Standard ТТІ Standard 
Label Category | Estimate | error DF |і-хаше | Pr > Է| Mean | error mean 
t=1 | Without | 1.0077 02515 |790 | 4.01 ДЕ" 0.03552 
lesion 
t= 1 |Moderate | 1.6007 0.2286 790 | 7.00 <0.0001 | 0.5700 | 0.03677 
lesion 
t=2 | Without 3.6637 0.5881 790 | 6.23 <0.0001 | 0.5850 | 0.03801 
lesion 
t=2 |Moderate | 3.2834 0.5881 790 | 5.58 <0.0001 | 0.4000 | 0.03761 
lesion 
t=3 | Without 2.0665 0.3414 790 | 6.05 <0.0001 | 0.3929 | 0.03755 
lesion 
t=3 Moderate | 2.4159 0.3300 790 | 7.32 <0.0001 | 0.5573 | 0.03762 
lesion 
t=4 | Without 4.8525 1.0059 790 | 4.82 <0.0001 | 0.6433 | 0.03687 
lesion 
t=4 Moderate | 4.2485 1.0071 790 | 4.22 <0.0001 | 0.3517 | 0.03669 
lesion 


under the “Mean” column of Table 8.22. The probability that a chick has no footpad 
lesion when receiving treatment 1 is Zoi = 0.315, whereas the value 0.57 corresponds to 
the cumulative probability Zo; + 71. From this value, we can calculate the probability 
of observing a moderate lesion, which is տլլ = 0.57 — տցլ = 0.57 — 0.315 = 0.255. 
From these probabilities, we can estimate the probability of observing a severe footpad 
lesion under treatment 1 as 721 = 1 — (0.57) = 0.43. Following the same logic, we can 
estimate the reference probabilities for the rest of the other treatments. 

Another important result is the odds ratio estimates. These estimates are shown in 
Table 8.23. 

These odds ratios compare the odds for the labeled category to those for the 
reference category for treatments 1—3 relative to treatment 4. These odds ratio values 
are derived from the estimated probabilities in each of the categories. For example, 
the probabilities that a chicken does not present a lesion and a moderate lesion 
are 74 = 0.6433 and 214 = 0.3517, respectively. From these probabilities, we 
can estimate the probability of observing a severe lesion as follows: 
7x4 = 1 — (0.6433 + 0.3517) = 0.005. The estimated odds ratio of not observing а 
lesion (“Without” lesion) between treatments 1 and 4 is 


Ddds отон — fni jf _ 0.315 , 0.6433 


=! Ba 0.113/ 0.005 0213 


the value provided in the odds ratio estimates table. If we compare the analysis using 
the cumulative logit link and the generalized logit link, we observe insignificant 
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Table 8.23 Estimated odds ratio 


Odds ratio estimates 


Category Trt _trt Estimate DF 95% Confidence limits 
Without lesion 1 4 0.021 790 0.003 0.163 
Moderate lesion 1 4 0.071 790 0.009 0.538 
Without lesion 2 4 0.305 790 0.031 2.979 
Moderate lesion 2 4 0.381 790 0.039 3.759 
Without lesion 3 4 0.062 790 0.008 0.493 
Moderate lesion 3 4 0.160 790 0.020 1.281 


changes in the estimated category probabilities by treatment as well as in the 
significance level in the test of treatment effects. 


8.6.2 CRD: Cheese Tasting 


Consider a study in which you want to know the effects of various additives on the 
flavor of cheese. Researchers tested 4 cheese additives and obtained 52 response 
ratings for each additive. Each response was measured on a scale of 9 categories 
ranging from: I dislike it very much (1) to I like it very much or excellent flavor (9). 
Data are obtained from the study by McCullagh and Nelder (1989) (Table 8.24). 

The components of the GLMM with an ordinal multinomial response аге as 
follows: 


Distributions: у, — Yos 9» 345 9355 236376 Увь — yor-Multinomial 
(Ni, ոլթ Л2ь 13 Nain Asi, 76, 771,78, Ло), Where Уі, уь Узь Удь Ysi 361376 Уві 
and yo; аге Ше observed frequencies of the responses in each category c of the 
hedonic scale (1 = very undesirable, ..., 5 = neither desirable nor undesirable, ..., 
9 = very desirable). 

Linear predictor: "(су = Ne + 01, where qç; is cth link (c =1,2,..., 8) for the additive 
type i, Ис is the intercept for Ше cth link, and o; is the fixed effect due to the ith 
additive. The link functions for each category are as follows: 


Tii + Л + Л 
lo = рл. 
ef “усл тян А £ 
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Table 8.24 Effect of addi- а | Additive |Y |Freq Additive [У |Freq 
tives on cheese flavor 1 1 1 0 3 n n 
2 |1 2 0 3 2 1 
3 1 3 1 3 3 6 
4 |1 4 7 3 4 8 
5 1 5 8 3 5 |23 
6 1 6 8 3 6 7 
7 1 7 19 3 7 5 
8 1 8 8 3 8 1 
9 |1 9 1 3 9 0 
10 |2 1 6 4 1 0 
и 12 2 9 4 2 0 
12 |2 3 12 4 3 0 
13 |2 4 11 4 4 1 
14 |2 5 7 4 5 3 
15 |2 6 6 4 6 7 
16 |2 7 1 4 7 14 
17. |2 8 0 4 8 16 
18 |2 9 0 4 9 11 


Л + Zi + 13; + Z4i 
] — lli 
08 (; (ті; Ւ ло; + 23i d =) 3 


Tii + Ло: + Z3i + Maj + 5i + Ле + Л _ 
log 1 | | | | | | աա 
Ji T Joi T J3i T Ла T Л5і T Moi 1 л) 


l Z + Tzi + лз + Taj + si + лб + лт + Tgi = 
0g 1 | | | | | | | աշ 
(mi Ւ Mi + Лу T Ла T 25; T Лат Лит Tgi) 


The following GLIMMIX commands fit a cumulative logit model with an ordinal 
multinomial response. 


proc glimmix ; 

class id additive scale; 

model scale (order=data) = additive/dist=Multinomial link=clogit 
solution oddsratio; 

estimate 'c=1, a=1' intercept 10 00 0 0 0 0 additive 1 0 0 0, 

'c=2, a=1' intercept 0100000 0 0additive 10 0 0, 

'c=3, a=1' intercept 0010000 0 0additive 1000, 

'c=4, a=1' intercept 00 01 0 0 0 0additive 10 00, 


8.6 Generalized Logit Models: Nominal Response Variables 355 


Table 8.25 Fixed effects Type Ш tests of fixed effects 
eu ine jeu nont Effect Num DF Den DF F-value Pr > F 
cumulative logit model 

Additive 3 197 38.11 <0.0001 


0 additive 1 
0 additive 1 
0 additive 1 
ladditive 1 
0 additive 0 
0 additive 
0 additive 
0 additive 
0 additive 
0 additive 


"с=5, a=1' intercept 
'c=6, a=1' intercept 
'c=7, a=1' intercept 
!'c—8, а-1! intercept 
!'c—1, а-2! intercept 
!'c—2, a=2' intercept 
!'c—3, а-2! intercept 
'c=4, а-2! intercept 
'c=5, a=2' intercept 
'c=6, a=2' intercept 


0000100 000, 
0000010 000, 
0000001 000, 
0000000 000, 
1000000 100, 
0100000 0100, 
0010000 0100, 
0001000 0100, 
0000100 0100, 
0000010 0100, 
'с=7, a=2' intercept 0000001 1 0 additive 01 0 0, 
'c=8, a=2' intercept 00 0 00 0 0 1additive 01 00, 
'c=1, a=3' intercept 10 0 00 0 0 0additive 0 01 0, 
'c=2, a=3' intercept 010 0 0 0 0 0additive 0 01 0, 
'c=3, a=3' intercept 0 010 0 0 0 0additive 0 01 0, 
'c=4, a=3' intercept 0001000 0 0additive 0 01 0, 
'с=5, a=3' intercept 00 0 010 0 additive 0 01 0, 
'c=6, a=3' intercept 0000010 0additive 0 01 0, 
'c=7, a=3' intercept 0000001 10 additive 0 01 0, 
'c=8, a=3' intercept 00000001 1additive 0 01 0, 
"с=1, a=4' intercept 10 0 0 0 0 0 0 additive 0 0 0 1, 
'c=2, a=4' intercept 0100000 0 0additive 0 0 01, 
'c=3, a=4' intercept 001 0 0 0 0 0additive 0 0 01, 
'c=4, a=4' intercept 0001000 0 0additive 0 0 01, 
'c=5, a=4' intercept 000 010 0 0additive 0 0 01, 
'c=6, a=4' intercept 00 0 00 10 0additive 0 0 01, 
'с=7, a=4' intercept 0000001 0 additive 0 0 01, 
'c=8, a=4' intercept 00000001 1additive 0 0 0 1/ilink; 
freq freq; 
run; 


Part of the results is shown in Table 8.25. The results of the analysis of variance 
show that the type of additive used in the manufacture of cheese significantly affects 
the degree of consumer acceptance (P = 0.0001). That is, the type of additive affects 
the sensory characteristics of the cheese. 

The contrast of hypothesis are presented in Table 8.26. The hypothesis tests are as 
follows: 


Tadditive; = additive;; Vİ Z j 


The results show that the additives provide different sensory characteristics that 
are reflected in the evaluation of preference. 

With the “solution” option in the model specification, Table 8.27 (fixed parameter 
estimates) shows the solution of the maximum likelihood estimates for the fixed 
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Table 8.26 Contrast of hypothesis in the acceptance of cheese made with four additives 


Contrasts 

Label Num DF Den DF F-value Pr > Е 
Additive effect 1 vs. 2 1 197 61.13 <0.0001 
Additive effect 1 vs. 3 1 197 21.19 <0.0001 
Additive effect 2 vs. 3 1 197 19.14 <0.0001 
Additive effect 2 vs. 4 1 197 108.45 <0.0001 
Additive effect 3 vs. 4 1 197 62.04 <0.0001 


Table 8.27 Maximum likelihood estimates of the fixed effects in the preference ratings of cheese 
made with different types of additives in the multinomial cumulative logit model 


Fixed parameter estimates 


Effect escala | Additive | Estimate Standard error | DF value | Pr > Idl 
Intercept 7, |1 —7.0802 0.5640 197 |—12.55 | <0.0001 
Intercept 7; |2 —6.0250 0.4764 197 |—12.65 | <0.0001 
Intercept 73 |3 —4.9254 0.4257 197 | —11.57 | <0.0001 
Intercept 7, |4 —3.8568 0.3880 197 | —9.94 ՀՕ.0001 
Intercept 75 |5 —2.5206 0.3453 197 |-7.30 <0.0001 
Intercept 7; |6 —1.5685 0.3122 197 |-5.02 <0.0001 
Intercept 77 |7 —0.06688 | 0.2738 197 | —0.24 0.8073 
Intercept ñg |8 1.4930 0.3357 197 | 4.45 <0.0001 
Aditivo d 1 1.6128 0.3805 197 |4.24 <0.0001 
Aditivo 02 2 4.9646 0.4767 197 |1041 «0.0001 
Aditivo 05 3 3.3227 0.4218 197 7.88 «0.0001 
Aditivo 04 4 0 


effects parameters. In this table, we observe the values of the estimated intercepts: 


Т = — 7.0802 defines the boundary between categories “1” and “2,” whereas 
Т = — 6.0250 defines the boundary between categories “2” and “3.” The third 
intercept, դ: = — 4.9254, defines the boundary between categories “3” and “4” 


and so forth. The estimated effects of the additive type (@;, i= 1, 2,3, and 4) are 
1.628, 4.9646, 3.3227, and 0, respectively. From these values, linear predictors are 
estimated for each of the categories. 

For example, the estimated linear predictor for a cheese made with additive 1, where 
the evaluator (consumer) assigns it category “1 = highly undesirable," is represented as 
пи = + @ = — 7.0802 + 1.6128 = — 5.4674; for the category “2 = moderately 
undesirable,” it is 75, = + @ = — 6.0250 + 1.6128 = — 4.4122; for the category 
“3 = slightly undesirable,” it is 3 —7]4 + @ = — 4.9254 + 1.6128 = — 3.3126; and 
for the category "4 = neither undesirable nor desirable" it 15 
Ha = + а, = - 3.8568 + 1.6128 = — 2.2440. These values are shown in the “Esti- 
mate" column of Table 8.28; other categories are similarly calculated for each type of 
additive. 

The estimated values in Table 8.27 obtained with the "estimate" command in 
conjunction with the “ilink” option prompts GLIMMIX to calculate the values of the 
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linear predictors fç; tabulated in the “Estimate” column and estimated probabilities 
Zciof all categories of each treatment, tabulated in the “Mean” column (ғ) , except 
for the reference category. 

From Table 8.28 (Estimates), we obtain the probabilities for each category that is 
reported under the “Mean” column. In this case, the probability for 711 = 0.004205. 
This value is obtained by taking the inverse value of the linear predictor ү = — 5.4674 
(ти =1/(1+ exp (54674) = 0.004205). To calculate the probability that a panelist 
would assign a rating of 2 (2 = moderately undesirable) to cheese made with additive 
І, we use the cumulative probability of 0.01198, which corresponds to 721 + 7. From 
this value, we obtain 721 = 0.01198 — լլ = 0.01198 — 0.004205 = 0.007775 and for 
the probability of assigning a rating of 3 to cheese made with additive 
1, 23; = 0.03514 — (ոշլ + T11) =0.03514 — 0.001198 = 0.033942. Following the 
same procedure, we obtain the other probabilities for the rest of the categories of each 
of the additives used in the manufacturing of cheese, which are tabulated in Table 8.29 
and can be seen in Fig. 8.3. 

Figure 8.3 shows the probability results of each flavor rating for each of the 
additives (it should be noted that some probability values were suppressed to avoid 
overwriting). It can be seen that additive 1 primarily receives ratings of 5—7; additive 
2 primarily receives ratings of 2—5; additive 3 primarily receives ratings of 4—6; and 
additive 4 primarily receives ratings of 7—9. 

The odds ratio results (Table 8.30) show the preferences more clearly. For 
example, the odds ratio additive 1 vs. 4 states that the first additive is 5.017 times 
more likely to receive a lower score than the fourth additive. 


8.7 Exercises 


Exercise 8.7.1 The dataset for this exercise corresponds to the results of 9 judges 
who rated 2 classes of wine, namely, white wine (WW = 1) and red wine (RW = 2), 
and, within each wine class, they rated 10 wines on a scale of 1—20 points. The 
minimum rating for a particular wine was 7, and the maximum rating was 19.5. For 
didactic purposes, ratings between 7 and 11 were assigned low quality, a rating 
between 12 and 15 as medium quality, and anything above 15 was considered 
excellent quality. The data are shown in Table 8.31 of the wine evaluation experi- 
ment under columns “Judge” (wine evaluator panelist), ^Wine type" (white wine: 
1, red wine: 2), “Quality” (low, medium, and excellent), and the frequency of the 
observed qualities (“у”). 


(a) Fit the cumulative logit proportional odds model to these data. Perform a 
complete and appropriate analysis of the data, focusing on: 


(1) An evaluation of the effects of the combination of treatments 
(ii) Interpretation of the odds ratios 
(ш) The expected probability per category for each treatment 
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Table 8.28 Estimates оп the model scale (Estimate) and on the data scale (Mean) based on judges’ 
preference ratings of cheese made with different types of additives in the multinomial cumulative 
logit model 


Estimates 

Standard Standard error 
Label Estimate error DF |ғуаше | Pr > Id Mean mean 
c= 1, —5.4674 | 0.5236 197 | —10.44 | «0.0001 | 0.004205 | 0.002192 
a=1 
oH 2, —4.4122 | 0.4278 197 | —10.31 | <0.0001 | 0.01198 | 0.005064 
а= 1 
с= 3, —3.3126 | 0.3700 197 | —8.95 | <0.0001 | 0.03514 | 0.01255 
а=1 
c= 4, —2.2440 |0.3267 197 | —6.87 |<0.0001 |0.09587 |0.02832 
а=1 
c=5, —0.9078 |0.2833 197 | --3.20 0.0016 | 0.2875 0.05804 
а=1 
c = 6, 0.04425 |0.2646 197 0.17 0.8673 |0.5111 0.06611 
а=1 
с- 7, 1.5459 0.3017 197 5.12 | <0.0001 | 0.8243 0.04369 
а=1 
c=8, 3.1058 0.4057 197 7.65 | <0.0001 | 0.9571 0.01665 
а=1 
c=1, —2.1155 | 0.4106 197 | —5.15 | «0.0001 | 0.1076 0.03942 
a=2 
6:222, —1.0603 | 0.3009 197 | -3.52 0.0005 0.05749 
а=2 
б==3, 0.03922 | 0.2735 197 0.14 0.8861 0.06836 
a=2 
c=4, 1.1078 0.2969 197 3.73 0.0002 0.05542 
a=2 
c=5, 2.4441 | 0.3397 197 7.19 | «0.0001 0.02497 
a=2 
c=6, 3.3961 | 0.3724 197 9.12 | <0.0001 0.01168 
a=2 
CET 4.8978 | 0.4249 197 11.53 | «0.0001 0.003124 
a=2 
с = 8, 6.4576 | 0.5045 197 12.80 | <0.0001 0.000789 
а= 2 
с = 1, —3.7575 |0.4761 197 | —7.89 | ՀՕ.0001 | 0.02281 | 0.01061 
а= 3 
02. —2.7023 |0.3677 197 | —7.35 | «0.0001 |0.06284 | 0.02165 
а-3 
СЗ —1.6027 | 0.3001 197 | —5.34 | «0.0001 | 0.1676 0.04186 
a=3 
с- 4, —0.5341 | 0.2556 197 —2.09 0.0379 | 0.3696 0.05955 
a=3 
с. 0.8021 |0.2610 197 3.07 0.0024 | 0.6904 0.05579 
a=3 


(continued) 
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Table 8.28 (continued) 


Estimates 

Standard Standard error 
Label Estimate error DF |t-value |Pr> l Mean mean 
с = 6, 1.7541 0.2984 197 5.88 | <0.0001 | 0.8525 0.03752 
a=3 
C=, 3.2558 | 0.3618 197 9.00 | <0.0001 | 0.9629 0.01293 
a=3 
Ը ՀՏ, 48157 | 0.4528 197 10.63 | <0.0001 | 0.9920 0.003610 
а-3 
cub —7.0802 |0.5640 197 | —12.55 | «0.0001 |0.000841 | 0.000474 
a=4 
c=2, —6.0250 | 0.4764 197 | —12.65 | «0.0001 | 0.002412 | 0.001146 
a=4 
c= 3, —4.9254 | 0.4257 197 | —11.57 | <0.0001 | 0.007207 | 0.003046 
a=4 
с= 4, —3.8568 | 0.3880 197 | —9.94 | <0.0001 |0.02070 | 0.007865 
a=4 
=, —2.5206 | 0.3453 197 | —7.30 | «0.0001 | 0.07443 | 0.02379 
a=4 
c=6, —1.5685 |0.3122 197 | —5.02 | «0.0001 | 0.1724 0.04455 
a=4 
Qd —0.06688 | 0.2738 197 | -0.24 0.8073 | 0.4833 0.06838 
a=4 
с = 8, 1.4930 0.3357 197 4.45 | <0.0001 | 0.8165 0.05029 
а= 4 


Ехегсіѕе 8.7.2 Data were obtained from а series of experiments conducted to 
reduce damage to potato tubers due to a potato lifter. The experiments were 
conducted at the Institute of Agricultural Engineering (IMAG-DLO) in Wageningen, 
the Netherlands. One source of damage was the type of rod used in the lifter. In the 
experiment — under consideration — eight types of rods were compared. It is an 
empirical fact that the degree of damage varies considerably between potato varieties 
with the type of rope used in the lifting of full potato sacks. Three blocks of 
observations were obtained for the combinations of varieties and rope types. Most 
of the combinations involved about 20 potatoes. For some combinations, there are 
no data due to an insufficient number of large potatoes. Tubers were dropped from a 
given height. To determine the damage, all tubers were peeled and the degree of blue 
coloration was classified into one of four classes (class 1: no damage; class 2: slight 
damage; class 3: moderate damage; and class 4: severe damage). The observations, 
in the form of counts per class and combination, are shown in Table 8.32 of the tuber 
experiment whose columns are “Variety” (1, 2, 3, 4, 4, 5, 6), "String" (1, 2, 3, 4, 5, 6, 
7, 8), "Block" (1, 2, 3), Type of damage (sd = no damage, dl = light damage, 
dm = moderate damage, ds = severe damage), and the observed frequency (Y). 
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Кір. 8.3 Estimated probabilities for the categories of acceptability for the cheese according to the 
type of additive 


Table 8.30 Odds ratio 


Odds ratio estimates 


Aditivo _Aditivo Estimate DF 95% Confidence limits 

1 4 5.017 197 2.369 10.625 
2 4 143.257 197 55.953 366.783 
3 4 27.735 197 12.071 63.724 


(a) List the components of Ше multinomial GLMM. 
(b) Fit the cumulative logit proportional odds model to these data. Perform a 
complete and appropriate analysis of the data, focusing on: 


(i) An evaluation of the effects of the combination of treatments 
(ii) Interpretation of the odds ratios 
(iii) The expected probability per category for each treatment 


(c) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 

(d) If as a result of b), you consider that an alternative cumulative logit model is 
better, then revise your analysis in a) accordingly. 


Exercise 8.7.3 Еш a generalized multinomial logit model using the dataset from 
Exercise 8.7.2 of this section, following the instructions: 
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Table 8.31 Results of the wine evaluation experiment 
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Judee 


Wine type 


Quality 
Low 


Medium 


Excellent 


Low 


Medium 
Excellent 


Low 


Medium 


Excellent 


Low 


Medium 


Excellent 


Low 


Medium 


Excellent 
Low 


Medium 


Excellent 


Low 


Medium 


Excellent 
Low 


Medium 


Excellent 


Low 


Medium 


Excellent 


Low 


Medium 


Excellent 


Low 


Medium 


Excellent 


Low 


Medium 


Excellent 
Low 


Medium 


Excellent 


Low 


Medium 
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Table 8.31 (continued) 


Judee Wine type Quality 
Low 
Medium 
Excellent 


Low 
Medium 
Excellent 


Low 
Medium 
Excellent 


Low 
Medium 
Excellent 


© | о | о | ©} © | оо | oo | oo 
N ә мәк | о Թ ԹԹ Է = 
+ гоол ол сос О ors 


(а) List Ше components of this model. 
(b) Perform a thorough and appropriate analysis of the data, focusing on: 


(i) An evaluation of the main effects and treatment interaction 
(ii) Odds ratio interpretation 
(ш) The expected probability per category for each treatment 


(c) Comment on and discuss your results. Cite relevant evidence to support your 
conclusion regarding the adequacy of the assumption. 


Exercise 8.7.4 In this exercise, the effects of judges’ experience on quality ratings 
of canned beans are assessed. Canning quality is one of the most essential traits 
required in all new dry bean (Phaseolus vulgaris L.) varieties, and selection for this 
trait is a critical part of bean breeding programs. Advanced lines, which are candi- 
dates for release as varieties, must be evaluated for canning quality for at least 
3 years from samples grown at different locations. Quality is evaluated by a panel of 
judges with varying levels of experience in evaluating breeding lines for visual 
quality traits. In all, 264 bean breeding lines from 4 commercial classes were 
conserved according to the procedures described by Walters et al. (1997). 

These included 62 white (navy), 65 black, 55 kidney, and 82 pinto bean lines plus 
control lines and “checks.” The visual appearance of the processed beans was 
determined subjectively by a panel of 13 judges on a 7-point hedonic scale 
(1 = very undesirable, 4 = neither desirable nor undesirable, 7 = very desirable). 
The beans were presented to the panel of judges in random order at the same time. 
Prior to evaluating the samples, all judges were shown examples of samples rated as 
satisfactory (4). There is concern that certain judges, due to lack of experience, may 
not be able to score canned samples correctly. 

From attribute-based product evaluations, inferences about the effects of experi- 
ence can be drawn from the psychology literature (Wallsten and Budescu (1981). 
Prior to the bean canning quality rating experiment, it was postulated that not only do 
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block, D = damage 


(sd = no damage, dl = slight damage, dm = moderate damage, ds = severe damage), and 


string, B = 


Table 8.32 Results of the tuber experiment. V = variety, C 


observed frequency 
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Table 8.32 (continued) 


s 8 a ЕЕЕ ЕЕЕ ЕЕЕ 2 =s B a gs B sa gm ЕЁ Е ЕЕ ЕЕ ЕЕ ЕЕ ЕЕ 
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Table 8.32 (continued) 


МЕНЕКЕИЕЕ ЕЕ ЕЕЕ 8/9 s S 2 v|s|S|S| v| s ЕЕЕ ЕЕ КЕСТЕЕІКІЕЕНЕНЕ 
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Table 8.32 (continued) 


s 8 a ЕЕЕ ЕЕ ЕЕЕ ЕЕЕ ЕЕЕ ТИ ЕЁ Е ЕЕ ЕЕ ЕЕ ЕЕ ЕЕ 
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Table 8.32 (continued) 


V С В D Ү V C |B D Ү У [С В р Ү 
4 2 3 ds 0 |5 2 3 ds 0 6 2 3 ds 0 
4 3 3 sd 15 5 3 3 sd 7 6 3 3 sd 2 
4 3 3 dl 4 |5 3 3 dl 12 6 3 3 dl 18 
4 3 3 dm 0 |5 3 3 dm 1 6 3 3 dm 0 
4 3 3 ds 0 |5 3 3 ds 0 6 3 3 ds 0 
4 4 3 sd 5 5 4 3 sd 6 6 4 3 sd 4 
4 4 3 dl 11 5 4 3 dl 6 6 4 3 dl 10 
4 4 3 dm 4 |5 4 3 dm 6 6 4 3 dm 6 
4 4 3 ds 0 |5 4 3 ds 2 6 4 3 ds 0 
4 5 3 sd 17 5 5 3 sd 16 6 5 3 sd 6 
4 5 3 dl 2 |5 5 3 dl 4 6 5 3 dl 13 
4 5 3 dm 0 |5 5 3 dm 0 6 5 3 dm 1 
4 5 3 ds 0 |5 5 3 ds 0 6 5 3 ds 0 
4 6 3 sd 16 5 6 3 sd 17 6 6 3 sd 18 
4 6 3 dl 2 |5 6 3 dl 3 6 6 3 dl 2 
4 6 3 dm 0 |9 6 3 dm 0 6 6 3 dm 0 
4 6 3 ds 0 |5 6 3 ds 0 6 6 3 ds 0 
4 7 3 sd 17 5 7 3 sd 18 6 7 3 sd 15 
4 7 3 dl 2 |5 7 3 dl 2 6 7 3 dl 5 
4 7 3 ат 1 5 7 3 ат 0 6 7 3 dm 

4 7 3 ds 0 |5 Й 3 ds 0 6 7 3 ds 0 
4 8 3 sd 17 5 8 3 sd 17 6 8 3 sd 19 
4 8 3 dl 25 |5 8 3 di 2 6 8 3 di 1 
4 8 3 dm 0 |5 8 3 dm 0 6 8 3 dm 0 
4 8 3 ds 0 |5 8 3 ds 0 6 8 3 ds 0 


less experienced judges have a more severe rating than do more experienced judges 
but also that experience should have little or no effect on the white beans for which 
the canning procedure was developed. Judges are stratified for the purpose of 
analysis by experience (less than 5 years, greater than 5 years). 

Counts by canning quality, judge experience, and bean breeding lines are listed in 
the following table (Table 8.33). 
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Table 8.33 Bean experiment results 


Black Kidney Navies Pinto 


<< 2 © > = > < = 
Cal |5 Years |5 Years |5 Years |5 Years |5 Years |5 Years |5 Years |5 Years 


(a) Fit the generalized logit model to these data. Perform a complete and appropriate 
analysis of the data, focusing on: 


(1) An evaluation of the effects of the combination of treatments 
(ii) Interpretation of the odds ratios 
(iii) The expected probability per category for each treatment 


(b) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 


Exercise 8.7.5 An experiment was conducted to look at the damage levels (ordinal 
categories 0—4) of Picea sitchensis shoots in two time periods (10 November and 
8 December), at four temperatures (different on each date), and at four ozone levels 
(Table 8.34). 


(a) Fit the cumulative logit proportional odds model to these data. Perform a 
complete and appropriate analysis of the data, focusing on: 


(i) An evaluation of the effects of the combination of treatments 
(ii) Interpretation of the odds ratios 
(iii) The expected probability per category for each treatment 


(b) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 
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Table 8.34 Experimental results of Picea sitchensis sprouts 


Weather Temperature (°C) Ozone Category Frequency 
1 —9 170 0 1 
1 —9 170 1 10 
1 —9 170 2 2 
1 —9 170 3 2 
1 —9 170 4 0 
1 -12 170 0 0 
1 -12 170 1 8 
1 -12 170 2 3 
1 -12 170 3 1 
1 -12 170 4 3 
1 -15 170 0 0 
1 -15 170 1 3 
1 —15 170 2 2. 
1 —15 170 3 4 
1 —15 170 4 6 
1 —18 170 0 0 
1 —18 170 1 1 
1 —18 170 2 1 
1 -18 170 3 4 
1 -18 170 4 9 
1 -9 120 0 1 
1 -9 120 1 9 
1 -9 120 2 4 
1 -9 120 3 1 
1 -9 120 4 0 
1 —12 120 0 0 
1 —12 120 1 7 
1 —12 120 2 7 
1 —12 120 3 1 
1 —12 120 4 0 
1 —15 120 0 0 
1 —15 120 1 1 
1 —15 120 2 5 
1 —15 120 3 6 
1 —15 120 4 3 
1 —18 120 0 0 
1 —18 120 1 0 
1 —18 120 2 4 
1 —18 120 3 5 
1 —18 120 4 6 
1 —9 70 0 4 
1 —9 70 1 6 
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Table 8.34 (continued) 


Weather 


Frequency 


3 


11 


Ozone 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 


70 
70 
70 
70 
70 


170 


170 


170 


170 


Temperature (°C) 


—12 
—12 
- 12 
102 
—12 
—15 
—15 
—15 
—15 
—15 
—18 
—18 
—18 
-18 
-18 


—12 
—12 
—12 
—12 
—12 
—15 
—15 
—15 
—15 
—15 
—18 
—18 
—18 
—18 
—18 
—15 
—15 
—15 
—15 
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Table 8.34 (continued) 


Weather Temperature (°C) Ozone Category Frequency 
2 —15 170 4 3 
2 —19 170 0 0 
2 —19 170 1 10 
2 —19 170 2 5 
2 —19 170 3 0 
2 —19 170 4 4 
2 —23 170 0 0 
2 —23 170 1 1 
2 —23 170 2 8 
2 —23 170 3 4 
2 —23 170 4 6 
2 —27 170 0 0 
2 —27 170 1 0 
2 -27 170 2 2 
2 —27 170 3 3 
2 —27 170 4 14 
2 -15 120 0 6 
2 -15 120 1 6 
2 -15 120 2 8 
2 -15 120 3 0 
2 -15 120 4 0 
2 -19 120 0 1 
2 -19 120 1 12 
2 -19 120 2 7 
2 —19 120 3 0 
2 -19 120 4 0 
2 -23 120 0 0 
2 -23 120 1 0 
2 -23 120 2 7 
2 —23 120 3 7 
2 —23 120 4 6 
2 —27 120 0 0 
2; —27 120 1 0 
2. —27 120 2 1 
2 —27 120 3 2 
2 —27 120 4 17 
2 —15 70 0 9 
2 —15 70 1 4 
2 —15 70 2 5 
2 —15 70 3 2 
2 —15 70 4 0 
2 —19 70 0 2 
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Table 8.34 (continued) 


Weather 


Frequency 
10 


17 


11 


Category 


Ozone 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 
70 


Temperature (°C) 


—19 
—19 
—19 
—19 
—23 
—23 
—23 
—23 
—23 
—27 
—27 
—27 
—27 
—27 
=15 
—15 
—15 
—15 
—15 
-19 
-19 
-19 
-19 
-19 
—23 
—23 
—23 
—23 
—23 
—27 
—27 
—27 
—27 
—27 
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Appendix 

Data: CRD with a multinomial response: ordinal 

Rep Trt Cat Freq Rep Trt Cat Freq 
тер1 MIHI Without 0 тер1 M2H3 Moderate 3 
rep2 MIHI Without 2 rep2 M2H3 Moderate 1 
тер3 MIHI Without 2 rep3 M2H3 Moderate 3 
rep4 MIHI Without 2 rep4 M2H3 Moderate 2 
тер1 MIH2 Without 2 тер1 М2Н4 Moderate 4 
rep2 MIH2 Without 0 rep2 M2H4 Moderate 2 
rep3 MIH2 Without 4 rep3 M2H4 Moderate 2 
rep4 MIH2 Without 2 rep4 M2H4 Moderate 5 
тер1 MIH3 Without 3 тер1 МОНІ Severe 4 
rep2 MIH3 Without 7 rep2 МОНІ Severe 6 
rep3 MIH3 Without 1 rep3 М2Н1 Severe 7 
rep4 MIH3 Without 2 rep4 МОНІ Severe 4 
тер1 MIH4 Without 0 тер1 M2H2 Severe 5 
rep2 MIH4 Without 5 rep2 M2H2 Severe 2 
rep3 MIH4 Without 2 rep3 M2H2 Severe 3 
rep4 MIH4 Without 1 rep4 M2H2 Severe 4 
тер1 М1Н1 Moderate 3 тер1 M2H3 Severe 3 
rep2 MIHI Moderate 2 rep2 M2H3 Severe 4 
rep3 MIHI Moderate 3 rep3 M2H3 Severe 4 
rep4 MIHI Moderate 5 rep4 M2H3 Severe 4 
тер1 М1Н2 Moderate 3 тер1 M2H4 Severe 5 
rep2 MIH2 Moderate 3 rep2 M2H4 Severe 6 
rep3 MIH2 Moderate 6 rep3 M2H4 Severe 0 
rep4 MIH2 Moderate 3 rep4 M2H4 Severe 3 
тер1 MIH3 Moderate 4 тер1 МЗНІ Without 0 
rep2 MIH3 Moderate 2 rep2 M3HI Without 3 
rep3 MIH3 Moderate 1 rep3 МЗНІ Without 2 
rep4 MIH3 Moderate 3 rep4 МЗНІ Without 0 
тер1 MIH4 Moderate 5 тер1 M3H2 Without 5 
rep2 MIH4 Moderate 4 rep2 M3H2 Without 3 
rep3 MIH4 Moderate 8 rep3 M3H2 Without 3 
rep4 MIH4 Moderate 4 rep4 M3H2 Without 2 
тер1 MIHI Severe 6 тер1 M3H3 Without 0 
rep2 MIHI Severe 6 rep2 M3H3 Without 2 
rep3 MIHI Severe 5 rep3 M3H3 Without 1 
тер4 MIHI Severe 3 rep4 M3H3 Without 0 
тер1 MIH2 Severe 5 тер1 M3H4 Without 3 
rep2 MIH2 Severe 7 rep2 M3H4 Without 5 
rep3 MIH2 Severe 0 rep3 M3H4 Without 7 
rep4 MIH2 Severe 5 rep4 M3H4 Without 3 
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Data: CRD with a multinomial response: ordinal 

Rep Trt Cat Freq Rep Trt Cat Freq 
тер1 MIH3 Severe 3 тер1 МЗНІ Moderate 0 
rep2 MIH3 Severe 1 rep2 МЗНІ Moderate 5 
rep3 MIH3 Severe й тер3 МЗНІ Moderate 5 
rep4 MIH3 Severe 5 rep4 МЗНІ Moderate 0 
тер1 MIH4 Severe 5 тер1 M3H2 Moderate 3 
rep2 MIH4 Severe 1 rep2 M3H2 Moderate 2 
rep3 MIH4 Severe 0 rep3 M3H2 Moderate 6 
rep4 MIH4 Severe 5 rep4 M3H2 Moderate 1 
тер1 М2Н1 Without 1 тер1 M3H3 Moderate 3 
rep2 М2Н1 Without 2 rep2 M3H3 Moderate 5 
rep3 М2НІ Without 1 rep3 M3H3 Moderate 3 
rep4 М2Н1 Without 1 rep4 M3H3 Moderate 3 
тер1 M2H2 Without 1 тер1 M3H4 Moderate 0 
rep2 M2H2 Without 3 rep2 M3H4 Moderate 2 
rep3 M2H2 Without 1 rep3 M3H4 Moderate 3 
rep4 M2H2 Without 4 rep4 M3H4 Moderate 4 
гері M2H3 Without 4 тер1 МЗНІ Severe 9 
rep2 M2H3 Without 5 rep2 МЗНІ Severe 2 
rep3 M2H3 Without 3 rep3 МЗНІ Severe 3 
rep4 M2H3 Without 4 rep4 МЗНІ Severe 10 
тер1 M2H4 Without 1 тер1 M3H2 Severe 2 
rep2 M2H4 Without 1 rep2 M3H2 Severe 5 
rep3 M2H4 Without 8 rep3 M3H2 Severe 1 
rep4 M2H4 Without 2 rep4 M3H2 Severe 7 
тер1 М2Н1 Moderate 4 тер1 M3H3 Severe 6 
rep2 М2Н1 Moderate 2 rep2 M3H3 Severe 3 
rep3 М2НІ Moderate 2 rep3 M3H3 Severe 6 
тер4 М2НІ Moderate 5 rep4 M3H3 Severe 7 
тер1 M2H2 Moderate 4 тер1 M3H4 Severe 7 
rep2 M2H2 Moderate 4 rep2 M3H4 Severe 3 
rep3 M2H2 Moderate 6 rep3 M3H4 Severe 0 
тер4 M2H2 Moderate 2 rep4 M3H4 Severe 3 
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Chapter 9 (Я) 
Generalized Linear Mixed Models im 
for Repeated Measurements 


9.1 Introduction 


Repeated measures data, also known as longitudinal data, are those derived from 
experiments in which observations are made on the same experimental units at 
various planned times. These experiments can be of the regression or analysis of 
variance (ANOVA) type, can contain two or more treatments, and are set up using 
familiar designs, such as completely randomized design (CRD), randomized com- 
plete block design (RCBD), or randomized incomplete blocks, if blocking is appro- 
priate, or using row and column designs such as Latin squares when appropriate. 
Repeated measures designs are widely used in the biological sciences and are fairly 
well understood for normally distributed data but less so with binary, ordinal, count 
data, and so on. Nevertheless, recent developments in statistical computing meth- 
odology and software have greatly increased the number of tools available for 
analyzing categorical data. 

A generalized linear mixed model (GLMM) is one of the most useful and 
sophisticated structures in modern statistics, as it allows complex structures to be 
incorporated into the framework of a general linear model. Fitting such models has 
been the subject of much research over the last three decades. GLMMs, for repeated 
measures, combine both generalized linear model (GLM) theory (e.g., a binomial, 
multinomial, or Poisson response variable) and linear mixed effects models. 

Experimentation is sometimes not well understood since researchers believe that 
it involves only the manipulation of the levels of independent variables and the 
observation of subsequent responses in dependent variables. Independent variables, 
whose levels are determined or set by the experimenter, are said to have fixed effects, 
although random effects are also very common, where the levels of the effects are 
assumed to be randomly selected from an infinite population of possible levels. 
Many variables of interest in research are not fully amenable to experimental 
manipulation but can nevertheless be studied by considering them to have random 
effects. For example, the genetic composition of individuals of a species cannot be 
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manipulated experimentally, but it is of great interest to geneticists aiming to assess 
the genetic contribution to individual variation of some specific behaviors. 

A GLMM with repeated measures is a generalization of the standard linear 
model, and this generalization is due to (1) the presence of more than one response 
variable that can be binary, ordinal, count, and so on and (2) the nonconstant 
correlation and/or variability exhibited by the data. The linear mixed model, there- 
fore, gives you the flexibility to model not only the means of your data (as in the 
standard linear model) but also their variances and covariances. Usually, a normal 
distribution is assumed for random effects. Since normally distributed data can be 
modeled entirely in terms of their means and variances/covariances, the two sets of 
parameters in a linear mixed model actually specify the full probability distribution 
of the data. The parameters of the mean structure in the model are called (known as) 
fixed effects parameters, which can be qualitative (as in traditional analysis of 
variance) or quantitative (as in standard regression), and the parameters of the 
variance—covariance of the model are known as covariance parameters, which help 
distinguish a linear mixed model from the standard linear model. Covariance 
parameters come up quite frequently in the following applications, with two more 
typical scenarios: 


(a) Experimental units on which data are measured can be grouped into clusters, and 
data from a common cluster are correlated. 

(b) Repeated measurements of the same experimental unit are taken, and these 
repeated measurements correlate or show some variability. 


The first scenario can be generalized to include a set of clusters nested within one 
another. For example, if students are the experimental unit, they can be grouped into 
classes (clusters), which, in turn, can be grouped into schools. Each level of this 
hierarchy may present an additional source of variability and correlation. The second 
scenario occurs in longitudinal studies, in which repeated measurements of the same 
experimental unit over time are taken. Alternatively, these repeated measures could 
be spatial or multivariate. 


9.2 Example of Turf Quality 


The proportional odds model, introduced by McCullagh (1980), was proposed as an 
extension of the generalized linear model used for ordinal responses. One can recall 
that the proportional odds model is a special case of a GLM with a cumulative link 
function in which the probability of an observation falling into a category or below is 
modeled. In the case of a logit link, with only two categories (a binary response), the 
proportional odds model reduces to a standard logistic regression or a classification 
model. As with any other type of response variable, repeated measurements are 
common in agronomic research. They result in clustered data structures with corre- 
lations between repeated observations in the same experimental unit that must be 
taken into account in the analysis. 
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Table 9.1 Turf quality of five grass varieties (low, Med = medium, Excel = Excellent, 
Sept = September) 


May July Sept 
No. of plots 
1 18 4 10 4 1 9 8 0 12 6 
2 17 2 11 4 0 7 10 0 9 8 
3 17 2 11 4 2 8 7 2 11 4 
4 18 8 7 3 4 8 6 4 13 1 
5 18 1 11 6 3 4 11 3 6 9 


The data were obtained from ап experiment studying the turf quality of five grass 
varieties. The varieties were sown independently іп 17 or 18 plots. The evaluations 
of the plots (experimental units) were carried out in the months of May, July, and 
September of the growing season, and turf quality was classified on an ordinal scale 
into three categories: low quality, medium quality, and excellent quality, as demon- 
strated in Table 9.1. 

The components of the GLMM, with repeated measures with an ordinal multi- 
nomial response, are as follows: 


Distributions: уту, уо, ya;lp;-Multinomial(NM;;,, ту, Лоу, Лу), Where у, yə;, and уз 
are the observed frequencies of the responses (turf quality) in each c category 
(low, medium, and excellent), and р; is the random effect due to the combination 

. B . 2 
variety x month (measurement time), assuming p; ~ N (0. s). 


Linear predictor: "суу = Ис + Ti + ру, where "(суу is the cth link (c = 1,2) in the ijth 
combination variety x month, 7, is the intercept for the cth link, т; is the fixed 
effect due to the ith treatment, and ру is the random effect due to the ijth 


measurement of variety x month б ~N (0. o? )) . The link functions 


variety x month 


for each category are as follows: 


707 — .. 
ա. ( - ч =e 


թր лоу + 71) a. 
1- (я +m) ” 


The following Statistical Analysis Software (SAS) program fits a repeated теа- 
sures GLMM with an ordinal response. 


proc glimmix data=turfgrass method=laplace; 

class Variety time; 

model cat (order=data)=variety| time/dist=Multinomial link=clogit 
solution oddsratio; 

random intercept/subject=variety type=cs solution ; 
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Table 9.2 Fit statistics under Covariance structure 
different correlation structures Toep 
Fit statistics CS AR(1) |(1) UN 
—2 Log likelihood 497.38 | 497.46 | 497.37 
AIC (smaller is better) 513.38 |513.46 |511.37 
AICC (smaller is better) 513.94 |514.02 |511.81 
BIC (smaller is better) 510.26 |510.34 |508.64 


CAIC (Consistent Akaike's |518.26 | 518.34 | 515.64 
information criterion) 
(smaller is better) 
HQIC (Hannan Quinn 504.99 | 505.07 | 504.03 
information crite- 
rion) (smaller is better) 


om "74 BOQO B 


estimate 'c=1, var=1' intercept 1 0 variedad10000, 
'c=2, var=1' intercept 0 1 variedad 10000, 

'c=1, уак=2' intercept 1 0 variedad 0 1 0 0 0, 

'c=2, var=2' intercept 0 1 variedad 0 1 0 0 0, 

'c=1, var=3' intercept 1 0 variedad 00100, 

'c=2, уак=3' intercept 0 1 variedad 0 0100, 

'c=1, var=4' intercept 1 0 variedad 0 0010, 

'c=2, var=4' intercept 0 1 variedad 0 0010, 

'c=1, уак=5' intercept 1 0 variedad 0 0 0 0 1, 

'c=2, var=5' intercept 0 1 variedad 0 0 0 0 1/ilink; 
freq y; 

run; 


Mixed models have advantages over fixed linear models (Littell et al. 1996) 
because they have the ability to incorporate fixed (XB) and random effects (Zb) 
that allow us to select different variance—covariance structures for repeated measures 
experiments (with or without missing data) to see which covariance structure best fits 
the model (Henderson 1984; Smith et al. 2005). Selecting or building a good enough 
model involves selecting a covariance structure that best fits the dataset. The 
information criteria minus two Restricted Log Likelihood (—2RLL), Akaike infor- 
mation criterion (AIC), Corrected Akaike’s information criterion (AICC), Bayesian 
information criterion (BIC), etc.) provided by proc GLIMMIX are used as statistical 
fit measures to select the variance structure (compound symmetry (“CS”), first-order 
autoregressive (“АК(1)”), Toeplitz (“Toep(1)”), unstructured (“UNY”) that best 
models the dataset. 

Most of the commands have already been explained. To provide the correlation 
structure that you want to model, with the above program, you vary the “TYPE” 
option = (CS, AR(1), Toep(1), and UN) separately to specify each of the covariance 
structures in the parentheses. Part of the results is shown below. 

According to the fit statistics (Table 9.2), the covariance structure that best fits the 
dataset is Toeplitz of order 1 (Toep(1)). The type III tests of fixed effects, shown in 
Table 9.3 part (a), indicate that grass variety provides different turfgrass qualities 
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Table 9.3 Results of the analysis of variance 


(a) Type Ш tests of fixed effects 


Effect Num degree of freedom (DF) Den DF F-value Pr> F 
Variety 4 10 4.80 0.0202 
(b) Solutions for fixed effects 

Effect Cat Variety | Estimate Standard error |DF |1-уаше | Pr > Id 
Intercept | Low —2.4509 0.3219 10 —7.61 <0.0001 
Intercept | Medium 0.1961 0.2721 10 0.72 0.4875 
Variety Varl 0.4261 0.3753 10 1.14 0.2827 
Variety Var2 -0.01502 | 0.3785 10 -0.04 0.9691 
Variety Var3 0.6125 0.3825 10 1.60 0.1404 
Variety Var4 1.4904 0.3943 10 3.78 0.0036 
Variety Var5 0 


Table 9.4 Estimated linear predictors and means on the model scale (Estimate) and on the data 
scale (Mean) for observed turfgrass quality in grass varieties in the multinomial generalized logit 
model 


Estimates 

Standard Standard error 
Label Estimate | error DF |1-уаше |Рг> Ш Mean mean 
e=1, —2.0248 | 0.3018 10 | —6.71 «0.0001 |0.1166 | 0.03110 
var = 1 
e= 32, 0.6222 | 0.2646 10 2.35 0.0405 | 0.6507 |0.06013 
уаг = 1 
@ =], —2.4659 |0.3177 10 |-776 |<0.0001 | 0.07828 | 0.02292 
var = 2 
@==:2; 0.1811 |0.2667 10 0.68 0.5125 | 0.5452 | 0.06613 
уат = 2 
с= 1, — 1.8384 | 0.3040 10 | —6.05 0.0001 | 0.1372 | 0.03599 
var = 3 
c= 2; 0.8086 | 0.2760 10 2.93 0.0150 | 0.6918 | 0.05884 
var = 3 
c= 1, —0.9605 | 0.2791 10 | -3.44 0.0063 | 0.2768 | 0.05588 
var = 4 
с-- 2. 1.6865 | 0.2992 10 5.64 0.0002 | 0.8438 | 0.03944 
var = 4 
e =.1, —2.4509 | 0.3219 10 | —7.61 <0.0001 | 0.07937 | 0.02352 
var = 5 
c=2, 0.1961 | 0.2721 10 0.72 0.4875 | 0.5489 | 0.06737 
var = 5 


(Р = 0.0202). The “solution” option in the model specification “Model” provides the 
solution of fixed effects of the model (intercepts and treatments), which we use to 
estimate the linear predictors й,; = դ, + Variety; (part (b)). 

The probabilities Z,; obtained using the “Estimate” information are tabulated 
under the “Mean” column of Table 9.4. 
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From these values, we can observe that for the category `c = 1, var = 1, ` the 


value of the linear predictor is 711 = + variety, = — 2.0248. Taking the inverse of 
fj; corresponds to the probability of 7;; = 0.1166 of observing “Low”-quality grass 
of variety 1. Now, for the category c = 2, var = 1, the inverse of the linear 


predictor is 0.6507, which is the estimate of the probability տլլ + 721. From this 
value, we can obtain the probability that variety 1 provides grass of “Medium” 
quality, that is, Z1; + 2721 = 0.6504, and, substituting the value of т, we obtain the 
probability value 721 = 0.6507 — 0.1166 = 0.5341. With these two probability esti- 
mates 711 and 71, it is possible to estimate the probability that variety 1 will yield ап 
“Excellent” quality turf, which is equal to 73; = 1 — 0.6504 = 0.3496. Likewise, we 
obtain the values of the remaining probabilities Zi; for the rest of the grass varieties. 


9.3 Effect of Insecticides on Aphid Growth 


A cage experiment was used to investigate the effect of three insecticides on aphid 
colonies with partial resistance to a common active compound. There were eight 
treatments: all combinations of the three insecticides and a control (no insecticide) 
with two types of colonies (susceptible or partially resistant). The experiment was 
organized as an RCBD with six blocks of eight cages, and each cage was assigned a 
treatment combination in each block. A colony of aphids was reared in each cage, 
and the number of live aphids was recorded before insecticide treatment was applied 
and then 2 and 6 days after application. Both hatches and deaths could occur within 
each cage between evaluations. The dataset from this experiment is shown below 
(Table 9.5). 

Following the same reasoning as in previous examples, the components of the 
GLMM with a Poisson response and repeated measures, which models the number 
of aphids (у), is described in the following lines. 


Distributions: уу | bi, insecticide x clone(block);;; ~ Poisson (Au) 


Ы ~ N (0, ойс), insecticide х clone(block) р ~ N (0,63 


insecticide x clone x бек) * 


Linear predictor: "ыш = 0 + I; C; + UC) + bi + IC(b);;jay + т + Ut) + (Ст) + UCT) ы, 
С, + (ІС); + b, + IC(b) iia + 7! + (Iz) + (Ст) + Ст) where Пим 18 Ше linear 
predictor, 0 is the intercept, J; (i = 1,2, 3) is the fixed effect due to the insecticide, 
C; (j = 1,2) is the fixed effect due to the aphid clone, (ЈС), is the fixed effect due 
to the interaction between the type of insecticide and clone, b; (k = 1, 2,3) is the 
random block effect, assuming b; ~ N (0, бос)» IC(b);;ay is the random effect of 
the interaction between the insecticide and clone within blocks, assuming 
insecticide x clone(block) кр) ~ № ША x clone хыюск)» T: (1 = 1,2,3) is the 
fixed effect due to measurement time, and (Ст), апа (Ст); are the fixed effects 
due to interaction. 
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Table 9.5 Effect of insecticides (C = control, R = resistant, S = susceptible) on aphid growth 


Block Cage Insecticide Clone Dial Dia2 Dia6 
1 1 Control R 60 111 220 
1 2 Control S 127 131 220 
1 3 D R 64 30 27 
1 4 р 5 110 27 35 
1 5 Н R 118 75 121 
1 6 H S 71 10 111 
1 7 P R 66 69 62 
1 8 P S 40 25 19 
2 1 Control R 54 152 156 
2 2 Control S 58 130 362 
2 3 D R 76 60 110 
2 4 D S 48 22 110 
2 5 н R 130 113 101 
2 6 H S 76 76 85 
2 7 P R 93 71 185 
2 8 P S 49 0 8 
3 1 Control R 94 175 292 
3 2 Control S 26 33 52 
3 3 D R 121 73 60 
3 4 D S 78 23 1 
3 5 H R 73 74 56 
3 6 Н 5 54 27 49 
3 7 Р R 25 10 32. 
3 8 Р S 36 22 1 
4 1 Control R 75 134 238 
4 2 Control S 86 57 194 
4 3 D R 69 32 12 
4 4 D S 122 66 20 
4 5 H R 185 88 251 
4 6 H S 47 23 116 


Link function: log(AÀ;;ju) = nix is the link function that relates the linear predictor to 
the mean (3). 


The following SAS program adjusts the GLMM with a Poisson distribution on 
repeated measures. 


proc glimmix nobound method=laplace; 

class ID Block Insecticide Cage Clone time; 

model y= Insecticide|clone|time/dist=poi; 
random intercept Insecticide*Clon/subject=block; 
lsmeans Insecticide|Clon|time/lines ilink; 
run;quit; 
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Table 9.6 Results of the analysis of variance іп the Poisson GLMM 


(a) Fit statistics 


Fit statistics CS AR(1) Toep(1) UN 

—2 Log likelihood 1125.54 1113.19 1127.25 No converge 
AIC (smaller is better) 1177.54 1165.19 1177.25 

AICC (smaller is better) 1202.17 1189.82 1199.67 

BIC (smaller is better) 1161.58 1149.24 1161.91 

CAIC (smaller is better) 1187.58 1175.24 1186.91 

HQIC (smaller is better) 1142.52 1130.18 1143.59 

(b) Fit statistics for conditional distribution 

—2 Log L (y | r. effects) 1006.38 
Pearson’s chi-square 484.48 
Pearson’s chi-square/DF 5.77 


Before fitting the GLMM, we compare the estimates of covariance structures with 
a Poisson distribution assumed in the response variable. According to the fit statis- 
tics, the covariance structure that best models the data is the autoregressive type of 
order 1 (AR(1)). The value of the fit statistic of the conditional distribution 
Pearson s chi — square/DF = 5.77 indicates that there is an extra variation (aka 
overdispersion) and that the Poisson distribution does not adequately fit the data 
(Table 9.6). 

Since there is overdispersion in the data, a highly recommended alternative is to 
find another suitable (or more appropriate) distribution for this dataset. In this case, 
the linear predictor will be the same, although now, a negative binomial distribution 
will be assumed in the response variable. That is, 


Уаш |bi, insecticide x clone(block) „ү, ^ Negative Binomial (Аи, $) 


This negative binomial model arises by assuming that the conditional distribution 
of observations given random blocks and Insecticide*clone(block);;;;) is as follows: 


Уы, insecticide*clone(block);;; ~ Poisson(Ajxi), where Арш ~ Gamma (2.0). 


The result of Ше new distribution of անը insecticide х clone(Block),;) is а 
negative binomial (Negative binomial (2, Ф)). The link function is log(Ajxa) = Ив. 
The following SAS code fits the GLMM with a negative binomial distribution. 


proc glimmix nobound method=laplace; 

class ID Block Insecticide Cage Clone time; 

model y= Insecticide |clone|time/dist=negbi; 
random intercept Insecticide*Clone/subject=block; 
lsmeans Insecticide |Clone|time/lines ilink; 

run; 
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Table 9.7 Fit statistics 


(a) Fit statistics 

—2 Log likelihood 

AIC (smaller is better) 

AICC (smaller is better) 

BIC (smaller is better) 

CAIC (smaller is better) 

HQIC (smaller is better) 

(b) Fit statistics for conditional distribution 


878.02 
932.02 
956.41 
915.45 
942.45 
895.66 


—2 Log L (y I r. effects) 841.84 
Pearson’s chi-square 72.47 
Pearson’s chi-square/DF 0.81 


Table 9.8 Estimated vari- 


(a) Covariance parameter estimates 
ance components and tests of 


Cov Parm Subject Estimate Standard error 
fixed effects m a l 

Variance Block 0.06138 0.03429 

AR(1) Block —0.7143 0.1710 

Scale 0.1654 0.03575 


(b) Type Ш tests of fixed effects 


Num Den 

Effect DF DF Pr> F 

Insecticide 3 19 <0.0001 
Clone 1 19 0.0086 
Insecticide*clone 3 19 0.1161 
Time 2 44 0.0047 
Insecticide*time 6 44 <0.0001 
Clone*time 2 44 0.0275 
Insecticide*clone*time |6 44 0.0663 


Part of the results is shown in Table 9.7. The values of the fit statistics, assuming a 
negative binomial distribution of the data, are shown in part (a), and the value of the 
conditional statistic 15 observed in part (b) (Pearson s chi — square/DF = 0.81). This 
indicates that overdispersion has been eliminated from the data, and, so, the negative 
binomial distribution adequately models the response variable. 

The estimated variance components are shown in part (a) of Table 9.8, under an 
АК(1) covariance structure. The estimates of the variance components of blocks, the 
interaction between the insecticide and clone within blocks, and the scale parameter 
жеді = 0.06613, արա» xclone(block) = - 0.7575, and ф = 0.1584, respectively. 
The fixed III type effects tests (part (b)) indicate that there is a significant effect of 
insecticide type (P < 0.0001), clone (P = 0.0387), measurement time (P = 0.0137), 
and interactions insecticide x measurement time (P < 0.0001) and clone x measure- 
ment time (P = 0.0259) on the average number of aphids. The interaction insecticide 
x clone х measurement time is close to significance (Р < 0.0663). 
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Table 9.9 Estimates of insecticide least squares (LS) means on the model scale (Estimate) and the 
data scale (Mean) 


Standard t- Standard error 
Insecticide | Estimate | error DF value |Pr» Id Mean mean 
C 4.7344 0.1478 19 3203 |<0.0001 |113.79 16.8211 
D 3.9647 0.1547 19 2562 |<0.0001 | 52.7043 | 8.1553 
н 4.3733 0.1561 19 2802 |<0.0001 | 79.3010 | 12.3753 
Р 3.4892 0.1753 19 1990 |<0.0001 | 32.7588 | 5.7432 


Table 9.10 Clone least squares means on the model scale (Estimate) and the data scale (Mean) 


Clone | Estimate | Standard error | DF |t-value | Pr > Id Mean Standard error mean 
R 4.4332 0.1320 19 | 33.58 «0.0001 | 84.1990 | 11.1158 
5 3.8476 0.1890 19 |2036 |<0.0001 | 46.8785 8.8586 


Table 9.11 Insecticide*clone least squares means on the model scale (Estimate) and the data scale 
(Mean) 


Standard 
Estimate | error DF 


t- Standard 
value | Pr > Id Mean error mean 


<0.0001 20.2032 
<0.0001 | 98.0186 | 24.3008 
21.49 | <0.0001 | 57.5153 | 10.8459 
16.59 | <0.0001 | 48.2958 | 11.2858 
23.59 | <0.0001 | 111.11 22.1870 
17.98 | <0.0001 | 56.5964 | 12.7026 
17.60 | <0.0001 | 59.5346 | 13.8263 
11.41 | «0.0001 | 18.0255 | 4.5675 


Insecticide 


4.0521 0.1886 19 
3.8773 0.2337 19 
4.7106 | 0.1997 19 
4.0359 |0.2244 19 
4.0866 |0.2322 19 
2.8918 0.2534 19 


VU MIM g oo 
сіл տ|)ա|տ|աց|տ|Ժ 


Time | Estimate | Standard error | DF | t-value | Pr > Id Mean Standard error mean 
1 4.2730 0.1434 44 | 29.79 «0.0001 | 71.7375 | 10.2905 
2 3.9108 0.1454 44 |26.90 | <0.0001 | 49.9372 | 7.2603 
3 4.2373 0.1457 44 |29.09 | <0.0001 | 69.2231 | 10.0830 


The linear predictors and estimated means of the factors and interaction are under 
the “Estimate” and “Mean” columns, respectively. Average number of aphids for 
insecticide, clone and time are given below: 

For insecticide type (Table 9.9): 

For clone (Table 9.10): 

For the interaction insecticide*clone (Table 9.11): 

For measurement time (Table 9.12): 

For the interaction insecticide*time (Table 9.13): 

For the interaction clone*time (Table 9.14): 

For the interaction insecticide*clone*time (Table 9.15): 
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Table 9.13 Insecticide*time least squares means on the model scale (Estimate) and the data scale 
(Mean) 


Standard Standard 
Time | Estimate | error Pr > Id Mean error mean 

C 1 4.2381 0.1930 <0.0001 | 69.2781 | 13.3733 
C 2 4.6631 0.1913 <0.0001 | 105.96 20.2696 
C 3 5.3019 0.1898 «0.0001 | 200.71 38.0898 
D 1 4.4854 0.1965 <0.0001 | 88.7111 | 17.4275 
D 2 3.7061 0.2014 <0.0001 | 40.6940 | 8.1959 
р 3 37026 0.2035 <0.0001 | 40.5537 | 8.2517 
H 1 4.4718 0.1978 <0.0001 | 87.5164 | 17.3133 
H 2 3.9790 0.2016 <0.0001 | 53.4625 | 10.7804 
H 3 4.6689 0.1977 <0.0001 | 106.59 21.0694 
P 1 3.8967 0.2241 <0.0001 | 49.2403 | 11.0358 
P 2 3.2949 0.2357 <0.0001 | 26.9755 | 6.3583 
Р 3 3.2759 0.2399 <0.0001 | 26.4664 | 6.3502 


Table 9.14 Clone*time least squares means оп Ше model scale (Estimate) апа the data scale 
(Mean) 


Standard t- Standard error 

Clone | Time | Estimate | error DF | value | Pr > И Mean mean 

R 1 4.3839 0.1595 44 | 27.49 | «0.0001 | 80.1482 | 12.7828 
R 2 4.2826 0.1605 44 | 26.68 | <0.0001 | 72.4270 | 11.6256 
R 3 4.6331 0.1601 44 | 28.94 | <0.0001 | 102.83 16.4644 
S 1 4.1621 0.2092 44 | 1990 | «0.0001 | 64.2093 | 13.4323 
S 2 3.5390 0.2131 44 |16.60 |<0.0001 | 34.4308 | 7.3387 
S 3 3.8416 0.2144 44 1791 | <0.0001 | 46.5989 | 9.9931 
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In this experiment, two types of pelleted feed were manufactured using different 
amounts of whole sorghum. Using the whole grain resulted in one feed with a high 
pellet durability index (РОГ) and one with a low PDI. The researcher was interested 
in how much impact this difference in PDI would have on the amount of intact and 
pelleted feed distributed to the different positions along the feeding line. The line 
was fed four times with the high PDI feed and four times with the low PDI feed. 
After each run, the total weight of the feed in each of the 12 identified trays was 
measured. The feed was then sieved into each tray, and the crushed fine granules 
were weighed in the feed line. The response of interest was the ratio (proportion) 
between the weight of fine granules and the total weight of the feed for each tray. The 
data for this experiment are in the Appendix (Data: Feeding line experiment). 

The experimental design used in this study was a split plot in a randomized 
completely design. There were 2 fixed factors, feed with 2 levels (high РПІ feed 
(H) and low PDI feed (L)), and a tray with 12 levels (1, 2, 3, ..., 12 locations along 
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Table 9.16 Results of the Sources of variation Degrees of freedom 
analysis of variance of the : 
à Feeding 2—1)21 
experiment - - 
Feeding (running) 24- D = 6 
Tray (12— 1) = 11 
Feeding*tray (2 – 1 (12 — 1) = 11 
Feeding (tray*tray) 2(12 — 1)(4 — 1) = 66 
Total axbxr—1=95 


the feed line). Different run levels (1, 2, 3, 4 runs in the feed line) may influence the 
inference of this experiment, so it is advisable to analyze which variance structure is 
suitable for this analysis. 

The ANOVA table (Table 9.16) with degrees of freedom for this experiment is 
shown below. 

The researcher aims to draw conclusions about the destructiveness in the feed line 
with two types of feed, high PDI and low PDI. The following GLMM is used to 
describe the experiment: 


Уа = H + о + alr) ix ++ (af); + ёк 


where yj, is the proportion observed in the run k (k — 1,2,3,4), tray 
J (j = 1,2,...,12), and in feed i (i = 1,2); и is the overall mean; о; is the fixed 
effect of feed i; (Г) is the random effect of the ith feed within the kth run, assuming 
a(r) ~ М(0,62,); В; is the fixed effect due to the jth tray; (ap); is the effect of the 
interaction between the ith feed and the jth tray; and є is the experimental error. 
The components of the conditional GLMM assuming that the response variable 
follows a beta distribution are listed below: 

The distribution of the response variable is given by уд; | a(r)i-Beta(u + а; + 
a(r)ix + В; + (а );, $) whose linear predictor is nj. = и + а; + a(r)ik + B; + (AP), with 
link function logit ( "ik ) =. The following GLIMMIX syntax fits a GLMM 


1 — zii zm 


with a beta distribution. 


proc glimmix method=laplace; 

class tray feed run; 

model ratio= feed | tray/dist=beta; 

random intercept/subject=feed (run) type=toep (1); 
lsmeans feed| tray/lines ilink; 

run; 


Part of the output is shown below. Four covariance structures (“CS,” “AR(1),” 
“Тоер(1),” and “О №”) were tested to see which one best fits the response variable. 
Of these covariance structures, “Тоер(1)” produced the best fit statistics (part (a), 
Table 9.17). 

Another important result that gives the guideline to continue with the analysis is 
the conditional distribution statistic (Pearson s chi — square/DF — 0.96), whose 
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Table 9.17 Results of the (a) Fit statistics 

analysis of variance —2 Log likelihood —429.84 
А1С (smaller is better) —377.84 
AICC (smaller is better) —357.19 
BIC (smaller is better) —375.78 
CAIC (smaller is better) —349.78 
HQIC (smaller is better) —391.77 
(b) Fit statistics for conditional distribution 
—2 Log L (ratio | r. effects) —453.24 
Pearson’s chi-square 91.40 
Pearson’s chi-square/DF 0.96 
(c) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr>F 
Feed 1 6 1071.19 <0.0001 
Tray 11 65 18.03 <0.0001 
Tray*feed 11 65 1.83 0.0660 


Table 9.18 Feed least squares means on the model scale (Estimate) and the data scale (Mean) 


Feed | Estimate | Standard error | DF |1-уаше | Pr > Id Mean Standard error mean 
H —2.0009 | 0.07409 6 -27.01 | <0.0001 | 0.1191 | 0.007773 
L 1.3832 | 0.07208 6 19.19 | «0.0001 | 0.7995 | 0.01155 


value indicates that the beta model adequately fits the data, whereas the fixed effects 
tests (part (c)) indicate that there is a statistically significant effect of feeding type 
(P = 0.0001) and tray (P = 0.0001). 

The linear predictors and estimated probabilities of the factors and interaction are 
listed under the “Estimate” and “Mean” columns of the following tables, 
respectively. 

For the feeding line (Table 9.18): 

For the tray (Table 9.19): 

For the interaction feeding*tray (Table 9.20): 


9.5 Characterization of Spatial and Temporal Variations 
in Fecal Coliform Density 


During a 1-month period (June 1981), 30 river water samples were collected from 
the channel at 3 stations, A, B, and C (downstream to upstream) on 5 randomly 
selected days at 9:00 a.m. and 3:00 p.m. (1 sample per station per hour per day). Each 
sample was analyzed for fecal coliform by method FC-96. The data from this 
experiment are shown in Table 9.21. 
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Table 9.19 Tray least squares means on the model scale (Estimate) and the data scale (Mean) 


Tray | Estimate | Standard error | DF | t-value | Pr > М Mean Standard error mean 
01 —0.5652 |0.08182 65 |—6.91 |<0.0001 |0.3623 |0.01891 
02 —0.6607 |0.08531 65 |-7Л4 |<0.0001 |0.3406 |0.01916 
03 —0.6950 | 0.08822 65 | —7.88 | <0.0001 | 0.3329 | 0.01959 
04 —0.2958 | 0.08100 65 | —3.65 0.0005 | 0.4266 | 0.01981 
05 —0.3773 | 0.08212 65 | —4.59 | <0.0001 | 0.4068 | 0.01982 
06 —0.2947 | 0.08057 65 | —3.66 0.0005 | 0.4268 | 0.01971 
07 —0.3520 | 0.08165 65 | —4.31 | <0.0001 | 0.4129 | 0.01979 
08 —0.2992 | 0.07939 65 | —3.77 0.0004 | 0.4258 | 0.01941 
09 —0.1314 | 0.07670 65 -171 0.0916 | 0.4672 | 0.01909 
10 —0.3935 | 0.08096 65 | —4.86 | <0.0001 | 0.4029 | 0.01948 
11 0.1571 | 0.07860 65 2.00 0.0499 | 0.5392 |0.01953 
12 0.2014 | 0.07949 65 2.53 0.0137 | 0.5502 | 0.01967 


Table 9.20 Tray*feed least squares means on the model scale (Estimate) and the data scale (Mean) 


Standard Standard error 

Tray | Feed | Estimate | error t-value |Рг > Ш | Mean mean 

01 H —2.2408 | 0.1284 -17.46 | <0.0001 | 0.09614 | 0.01116 
01 L 1.1104 | 0.1015 10.94 | «0.0001 | 0.7522 | 0.01892 
02 H -2.4581 | 0.1369 —17.95 | <0.0001 | 0.07885 | 0.009946 
02 L 1.1367 | 0.1018 11.17 | <0.0001 |0.7571 |0.01872 
03 Н —2.4724 | 0.1375 -17.98 |<0.0001 | 0.07782 | 0.009869 
03 L 1.0823 | 0.1105 9.79 | <0.0001 | 0.7469 | 0.02089 
04 H —2.0307 | 0.1217 —16.69 | <0.0001 |0.1160 | 0.01248 
04 L 1.4391 | 0.1070 13.45 | «0.0001 | 0.8083 | 0.01658 
05 H —2.1481 | 0.1254 —17.13 | <0.0001 | 0.1045 | 0.01174 
05 L 1.3935 (0.1061 13.13 | <0.0001 | 0.8011 | 0.01690 
06 H —2.0087 | 0.1208 —16.62 | <0.0001 | 0.1183 | 0.01260 
06 L 1.4192 | 0.1066 13.31 | «0.0001 | 0.8052 | 0.01673 
07 H —2.1026 | 0.1242 —16.93 | <0.0001 | 0.1088 | 0.01204 
07 L 1.3987 | 0.1061 13.18 | «0.0001 | 0.8020 | 0.01685 
08 H —1.9310 | 0.1192 —16.20 | <0.0001 | 0.1266 | 0.01318 
08 L 1.3325 | 0.1050 12.69 | «0.0001 | 0.7913 | 0.01734 
09 H -1.6240 | 0.1113 —14.59 | <0.0001 | 0.1647 | 0.01531 
09 L 1.3613 | 0.1056 12.89 | «0.0001 | 0.7960 | 0.01715 
10 H —2.0863 | 0.1238 —16.85 | <0.0001 | 0.1104 | 0.01216 
10 L 1.2994 | 0.1044 12.45 | «0.0001 | 0.7857 | 0.01757 
11 H —1.4559 | 0.1075 —13.55 | <0.0001 | 0.1891 | 0.01648 
11 L 1.7701 | 0.1148 15.42 | «0.0001 | 0.8545 | 0.01427 
12 H —1.4519 | 0.1076 —13.50 | <0.0001 | 0.1897 | 0.01653 
12 L 1.8548 | 0.1171 15.83 | «0.0001 | 0.8647 | 0.01371 
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Table 9.21 Variation in fecal coliform densities of the river water samples from three sampling 
stations on five sampling days at 9:00 a.m. (TM = 1) and 3:00 p.m. (TM = 2) 


Sampling date TM Site No. of coliforms per milliliter 
18 May 9:00 a.m. A 648 
18 May 3:00 p.m. A 798 
18 May 9:00 a.m. B 517 
18 May 3:00 p.m. B 702 
18 May 9:00 a.m. С 532 
18 May 3:00 p.m. C 55 
26 May 9:00 a.m. A 1421 
26 May 3:00 p.m. A 1388 
26 May 9:00 a.m. B 1883 
26 May 3:00 p.m. B 1855 
26 May 9:00 a.m. С 1724 
26 Мау 3:00 p.m. C 1769 
29 May 9:00 a.m. A 1523 
29 May 3:00 p.m. A 759 
29 May 9:00 a.m. B 1361 
29 May 3:00 p.m. B 603 
29 May 9:00 a.m. C 2004 
29 May 3:00 p.m. C 541 
1 June 9:00 a.m. A 1987 
1 June 3:00 p.m. A 1056 
1 June 9:00 a.m. B 1796 
1 June 3:00 p.m. B 1579 
1 June 9:00 a.m. C 1221 
1 June 3:00 p.m. C 1223 
5 June 9:00 a.m. A 870 
5 June 3:00 p.m. A 1099 
5 June 9:00 a.m. B 920 
5 June 3:00 p.m. B 951 
5 June 9:00 a.m. C 926 
5 June 3:00 p.m. C 887 


To assess the relative magnitudes of sources of variation due to time, site, and 
subsampling on the number of coliforms per milliliter (yj), an analysis of variance 
using a GLMM with a Poisson response was performed, as described below: 

We denote у; as the number of colonies per milliliter, whose conditional 
distribution is given by y;,|sampling(site),, ~ Poisson (А) with the linear predictor 
nix defined by 


ijj, = Ө + site; + sampling (site) , + time; + (site x time); 
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Table 9.22 Results of the analysis of variance 


(a) Estructuras de covarianza 


Fit statistics Toep(1) CS AR(1) UN 

—2 Log likelihood 2022.64 2022.64 2022.64 2022.64 
AIC (smaller is better) 2054.64 2056.64 2056.64 2054.64 
AICC (smaller is better) 2096.49 2107.64 2107.64 2096.49 
BIC (smaller is better) 2051.31 2053.10 2053.10 2051.31 
CAIC (smaller is better) 2067.31 2070.10 2070.10 2067.31 
HQIC (smaller is better) 2041.31 2042.47 2042.47 2041.31 
(b) Fit statistics for conditional distribution 

-2 Log L (ufc | r. effects) 1989.36 
Pearson’s chi-square 1632.28 
Pearson’s chi-square/DF 54.41 
(с) Type Ш tests of fixed effects 

Effect Num DF Den DF F-value Pr>F 
Site 2 3 1.41 0.3700 
T 4 12 956.04 <0.0001 
T*site 8 12 82.44 «0.0001 


(i21,2,3;j —1,2,3,4,5; k = 1,2) 


where rjj, is the linear predictor that relates the linear function to the mean, 0 is the 
intercept, site; is the fixed effect due to the sampling site i, sampling(site);, is the 
random effect due to the sampling time nested within the site, assuming 


2 
sampling(site) 


sampling(site), ~ N (0-6 ) time; is the fixed effect due to sampling 


date, and (site x ішпе); is the effect of the interaction between the site and sampling 
date. The link function for this model is log(Ajx) = rij 
The following GLIMMIX syntax fits a GLMM with a Poisson response. 


proc glimmix data=ufc nobound method=laplace; 
class TTMSite; 

model ufc= Site | T/dist=Poisson link=log; 

random intercept/subject=TM(Site) type=toep(1); 
lsmeans Site|T/lines ilink; 

run; 


Part of the results is summarized in Table 9.22. To determine which covariance 
structure best models the response variable, four types were tested (part (a)), all of 
which produced very similar results. Because of these results, the “Тоер(1)” covari- 
ance structure was chosen. From this, the fit statistics were obtained, and the value of 
the conditional distribution statistic is Pearson's chi — square/DF = 54.41. This 
value indicates that there is a strong overdispersion in the dataset. Therefore, it is 
important to look for an alternative distribution that solves this problem. 

The hypothesis tests in part (c) indicate that there is a significant difference in the 
date of sampling (P = 0.0001) as well as in the interaction between the site and date 
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Table 9.23 Fit statistics 


— | (a) Fit statistics 
under the negative binomial 


—2 Log likelihood 436.98 


distribution 

AIC (smaller is better) 470.98 
AICC (smaller is better) 521.98 
BIC (smaller is better) 467.44 
CAIC (smaller is better) 484.44 
HQIC (smaller is better) 456.81 
(b) Fit statistics for conditional distribution 

—2 Log L(ufc | r. effects) 432.83 
Pearson’s chi-square 22.66 
Pearson’s chi-square/DF 0.76 


of sampling (P = 0.0001). That is, the concentration of fecal coliform units per 
milliliter is affected by the date of data collection. However, we observed that there 
is an excessive dispersion in the data. One way to check for and deal with 
overdispersion is to run a quasi-Poisson model, which, during the fitting process, 
adds an additional dispersion parameter to account for that additional variance. 
Another option is to look for a distribution that adequately fits the data; in this 
case, the negative binomial distribution is a good alternative. 

Next, we will implement the analysis assuming that the response variable is 
distributed under a negative binomial distribution. This means that the distribution 
of уж (number of colonies per militro) is given by ух | smapling(site);,- Negative 
Binomial (4, ф), where ф is the scale parameter. However, the linear predictor q; 
and the link function remain unchanged. 

The following GLIMMIX commands fit a GLMM with a negative binomial 
distribution. 


proc glimmix data=ufc nobound method=laplace; 
class T TM Site; 

model ufc = Site|T/dist=negbin; 

random intercept/subject=TM (Site) /type=Toep (1); 
lsmeans Site|T/lines ilink; 

run; 


Part of the output of the above program is shown below. The values of the fit 
statistics under the negative binomial distribution (part (a) of Table 9.23) are much 
smaller compared to those obtained assuming the Poisson model, indicating that the 
negative binomial distribution adequately fits the response variable. Furthermore, 
the value of the conditional distribution statistic indicates that the negative binomial 
distribution is a good distribution for these data (Pearson s chi — square/DF — 0.76). 

This parameter (Pearson' schi— ЧЕ — 0.76) refers to how many times the 
variance is larger than the mean. Since this value is less than 1 (part (b)), the 
conditional variance is actually smaller than the conditional mean, indicating that 
overdispersion has been removed in the fitting of the data. Another direct effect 
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Table 9.24 Type III fixed Effect Num DF Den DF F-value Pr > F 
Нев Site 2 3 0.78 0.5346 
T 4 12 11.57 0.0004 
T*site 8 12 1.13 0.4096 


Table 9.25 Means and standard errors on the model scale (Estimate) and on the data scale (Mean) 
of the sampling site data 


Site | Estimate | Standard error | DF | t-value | Pr > й Mean Standard error mean 
55.25 «0.0001 | 1118.25 | 142.06 
55.28 «0.0001 | 1123.57 | 142.78 
52.30 <0.0001 919.40 | 119.95 


A 7.0195 0.1270 


B 7.0243 0.1271 
C 6.8237 0.1305 


Ó WwW} LW 


Table 9.26 Means and standard errors of measurement time on the model scale (Estimate) and the 
data scale (Mean) 


T |Estimate | Standard error Բ |1-уаше | Pr > Id Mean Standard error mean 
1 | 6.2084 0.1467 12 | 42.32 <0.0001 496.91 72.8990 

2 = | 7.4212 0.1420 12 | 52.27 «0.0001 | 1670.97 | 237.23 

3 | 7.0074 0.1455 12 |4816 «0.0001 | 1104.74 | 160.75 

4 | 7.2910 0.1418 12 | 51.42 «0.0001 | 1466.97 | 208.00 

5 | 6.8513 0.1422 12 | 48.19 <0.0001 945.09 | 134.35 


observed when there is no overdispersion is the F-values of the fixed effects tests 
(Table 9.24). In this case, the date on which the samples were collected was 
significant but not the interaction between the two factors, as the case when the 
data were fitted using the Poisson GLMM. 

The linear predictors and estimated probabilities of the main effects and the 
interaction between both factors are under the columns “Estimate” and “Mean,” 
respectively. The sampling site averages are presented below (Table 9.25). 

The averages by sampling date are listed below (Table 9.26). 

The means of the interaction site x sampling date are shown below (Table 9.27). 


9.6 Log-Normal Distribution 


Positively skewed distributions are highly common, especially when modeling 
biological data. Data often have a lower bound, usually 0 or the detection limit, 
but have no restriction on the upper bound. Therefore, when the data are below the 
median, no observation can be further away than the lower bound; however, when 
the data are above the median, there may be values that are many times further away, 
giving a positively skewed distribution. These skewed distributions can often be 
approximated by a log-normal distribution (Limpert et al. 2001). 
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Table 9.27 Means and standard errors for the interaction T*site on the model scale (Estimate) and 


the data scale (Mean) 


Standard Standard error 

Site | T | Estimate | error Pr > И Mean mean 
A 1 |6.5905 0.2463 «0.0001 | 728.17 179.35 
B 1 |6.4197 0.2466 «0.0001 | 613.79 |151.38 
C 1 |5.6151 0.2772 «0.0001 274.53 | 76.1038 
A 2 | 7.2508 0.2452 «0.0001 | 1409.17 345.59 
B 2 |7.5367 0.2451 «0.0001 | 1875.54 | 459.64 
C 2 |7.4761 0.2463 «0.0001 |1765.30 | 434.71 
A 3 |7.0336 0.2461 «0.0001 |1134.09 |279.14 
B 3 |6.8855 0.2465 «0.0001 | 978.01 241.05 
C 3 |7.1030 0.2586 «0.0001 | 1215.59 314,37 
A 4 |7.3224 0.2458 «0.0001 |1513.87 372.07 
B 4 |7.4329 0.2450 «0.0001 | 1690.62 414.28 
C 4 |7.1176 0.2463 «0.0001 | 1233.47 303.81 
A 5 | 6.9003 0.2460 <0.0001 | 992.56 244.21 
В 5 | 6.8467 0.2458 <0.0001 | 940.73 231.25 
C 5 |6.8069 0.2453 «0.0001 | 904.07 221.80 
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Fig. 9.1 Density function of the log-normal distribution with parameters 1 and 0.6 


A log-normal distribution is characterized by having only positive nonzero 
values, positive skewness, a nonconstant variance that is proportional to the square 
of the mean value, and a normally distributed natural logarithm. The probability 
density function for a log-normal distribution has an asymmetric appearance, with a 
larger amount of data below the expected value and a thinner right tail with higher 
values. Figure 9.1 shows the positive skewness of a log-normal distribution with 


mean 1 and standard deviation 0.6. 
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9.6.1 Emission of Nitrous Oxide (N;O) in Beef Cattle Manure 
with Different Percentages of Crude Protein in the Diet 


The experiment was conducted between January and February 2017 at the Colegio 
de Postgraduados Campus Córdoba located in Amatlán de los Reyes, Veracruz, 
México. The genetic material used were four 5-6-month-old males of the Criollo 
lechero tropical (CLT) breed, randomly distributed in individual pens of 
4.8 x 2.1 m?, each one with 7596 shade, a cup drinker, and a drawer-type feeder. 
To ensure the required crude protein percentages for each treatment, the following 
diets (treatments 1—4) were developed: Ти (12% crude protein), Trt2 (14% crude 
protein) Trt3 (16% crude protein), and Trt4 (commercial feed with 1696 crude 
protein). Each animal randomly received the four treatments in different periods. 
Each treatment was applied for 11 days, of which the first 7 were considered 
adaptation days and the following 4 days were used for the measurement of gases 
in the daily accumulated excreta. The experiment had a total duration of 44 days. The 
data from this experiment are tabulated in the Appendix (Data: Nitrous oxide 
emission). The N2O gas fluxes in ppm were calculated from a linear or nonlinear 
increase of the concentrations inside the static chambers over time, and these fluxes 
were converted to micrograms of №О-М per m? per hour (у); for more details, see 
the study by Nadia Hernández-Tapia et al., (2019). The statistical model used in this 
study was an analysis of covariance model in a randomized complete block design 
with repeated measures, as described below. 


Уж =H + Ti + animal; + time, + (т x time); + f; (ху — x) + ей 


where у; is the flux of №О-М (ug m^? h y и is the overall mean; т; is the fixed 
effect due to treatment i (i = 1, 2,3, 4); animal; is the random effect due to animal 
j (j = 1,2,3,4), assuming animal;-N(0, o” animal); time, is the fixed effect of time 
k (k = 1,2,3,4,5) at the time of measurement; (т x time); is the effect of the 
interaction between т; and timez, р; is the coefficient of linear regression of the 
covariate x; in treatment i and time j, where x, can be the pH, humidity (HE), 
temperature (TE) in the manure, maximum temperature (TMaxA), minimum tem- 
perature (TMinA), maximum humidity (HMaxA), minimum humidity (HMinA), or 
initial weight (kilograms) at the start of a treatment; x is the mean of the covariate in 
question; and є is the non-normal experimental error. 

The linear predictor դյ, for №О-М№ is 7j,— + т: + animal; + tim; + 
(тж time), + fj; (x; — x). The response variable у has a conditional log-normal 


distribution with a mean и, and variance (e — 1) r+ that is, yi|animal; ~ Log 
normal (И (e = 1) eto). the rest of the parameters have already been 


described above. 
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The following GLIMMIX syntax adjusts a GLMM with a log-normal response: 


proc glimmix data=co2 nobound method=laplace; 
class animal trt time; 

model Нох =trt | time xbar/dist=lognormal; 
random intercept /subject=animal type=cs; 
lsmeans trt | time/lines ilink; 

run; 


Although most of the commands have already been described in previous 
chapters, in this chapter, we average the TMinA covariate “xbar.” Part of the output 
is shown below. 

The gas emissions from cattle manure, regardless of the treatment applied, are 
influenced by several factors (covariates) that the researcher cannot control, which 
have a significant effect on the estimation of means and experimental error. Both are 
linearly related to the response variable. Covariates such as pH, humidity, and 
temperature of the excreta, as well as the temperature and humidity (maximum 
and minimum) of the environment, influence the dynamics of gas emission. These 
covariates were considered and analyzed in the covariance model to adjust the 
estimated means of the N5O-N flux. Based on the fit statistics obtained from the 
proposed models (Table 9.28), the model that best explains the variability of the 
N2O-N flux is model 5 because this model provides the lowest values in AIC, AICC, 
BIC, and MSE (Mean Square Error). Therefore, the model that provides the best fit 
or explains the most variability in the N5O-N flux is the one that includes the 
minimum environment temperature. 

The conditional fit statistics (part (a)) and the estimated variance components 
(part (b)) are shown in Table 9.29. The type III fixed effects tests (part (c)) indicate 
that there is a significant effect of Trt (P — 0.0008), time (P — 0.0288), the 
interaction Trt x time (P — 0.0140), the covariate Tmin (P — 0.0079), and the 
interaction Tmin x Trt (P — 0.038). 

The average N5O-N emissions between Trt! (12% CP: Crude Protein) and Trt2 
(14% CP) were statistically different from each other. Treatment 1 emitted the 
highest N;O-N flux despite being the treatment with the lowest percentage of CP 
(Table 9.30). 


9.7 Effect of a Chemical Salt on the Percentage Inhibition 
of the Fusarium sp. 


In order to observe the tolerance of the fungus Fusarium sp. to different concentra- 
tions of a chemical salt, a bioassay was implemented to evaluate the percentage of 
inhibition of the fungus. This bioassay consisted of placing a nutritive culture 
medium in Petri dishes for the fungal development in which different concentrations 
of the salt in ppm were added (0, 500, 1000, апа 2000, ). Mycelium growth was 
measured during 6 days, and the percentage of inhibition of Fusarium sp. growth 
was calculated. Part of the data is shown below, and the complete base is in the 
Appendix (Data: Percentage inhibition). 
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Table 9.29 Conditional fit 


statistics, variance compo- 
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nents, and type III fixed effect 


tests 


Table 9.30 Mean and stan- 


dard error of N flux, O (ug of 


N20-N m? h `!) of the dif- 
ferent treatments under study 


(a) Fit statistics for conditional distribution 


—2 Log L (F | r. effects) 333.62 
Pearson’s chi-square 99.06 
Pearson’s chi-square/DF 0.76 


(b) Covariance parameter estimates 


Cov Parm Subject Estimate Standard error 
Variance A —0.01561 

CS A 0.000767 

Residual 0.7776 0.09845 

(c) Type III tests of fixed effects 

Effect Num DF |DenDF | F-value |Pr >F 
Trt 3 88 6.13 0.0008 
Time 4 88 2.84 0.0288 
Trt*time 11 88 2.34 0.0140 
Tmin 1 88 7.40 0.0079 
Tmin*Trt 3 88 4.81 0.0038 
Tmin*time 4 88 1.80 0.1351 
Tmin*Trt*time | 11 88 1.23 0.2814 
Treatment N2O(p) + standard error 


Ти (12% PC) 


3.6442 + 0.2213a 


Trt2 (14% PC) 


3.0714 + 0.3119b 


Trt3 (16% PC) 


3.5706 + 0.2974ab 


Trt4 (16% CP, commercial feed) 


3.1205 + 0.2130ab 


Bio Day Conc Rep 
1 1 0 3 
1 1 0 4 
1 2 0 2 
1 2 0 3 
1 3 0 2 
1 3 0 3 
1 4 0 3 
1 4 500 1 
1 5 0 3 
1 5 0 4 
1 6 1000 3 
1 6 1000 4 
1 6 2000 1 


(continued) 
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Bio Day Conc Rep Y Bio Day Conc Rep Y 

1 6 2000 2 29.655 2 6 2000 4 30.522 
1 6 2000 3 31.724 

1 6 2000 4 35.172 


Following the same reasoning as in previous examples, the components of the 
GLMM with beta response distribution repeated-measures for the percentage inhi- 
bition of Fusarium sp. (уры) are listed below: 


Distributions: ур | д, сопс(о):ку~-Веќа(лу, ф); і = 1, `4] = 1,...,6; к= 1, 2; 
[= 1, ..., тр фы ~ N(0, а. | conc(@);(kp ~ мо, o? ) 


conc(o) 


Linear predictor: 1 = 0 + conc; + @q + conc(@) (и) + time; + (conc x time); 
where 77; is the linear predictor, 0 is the intercept, conc; is the fixed effect of salt 
concentration, c; is the random effect of the Petri dish within the bioassay, 
assuming оу ~ N (0, օ2). сопс(@)ыу is the random effect of salt concentration— 


Petri dish-bioassay, assuming conc(o); k) ~ N (0. Оой), time; 15 Ше fixed effect 


due to the day of measurement, and (conc x time);; is the interaction effect of 
chemical salt concentration with the day of measurement. 


Link function: logit(zjx;) = ñj is the link function that relates the linear predictor to 
the mean (л). 


The following SAS program adjusts the beta GLMM with repeated measures. 


procglimmix data=inhibition method=laplace nobound; 
class Bio Day Conc Rep; 

model pct = Con|Day/dist=beta link=logit; 

random intercept/subject=con (bio) type=cs; 

lsmeans Con |Day/lines ilink; 

run; 


Before fitting the generalized linear mixed model, we compare the estimates of 
the covariance structures with the beta distribution in the response variable 
(Table 9.31 part (a)). According to the fit statistics, the covariance structures that 
best fit the data are the Toeplitz type (Toep(1)) and unstructured (UN). 

Having defined the covariance structure, in this case, Toeplitz of order 1, we 
present part of the results of the data fit (Table 9.31 part (b)). The fit statistic 
Pearson's chi — square/DF — 1.07 indicates that there is no overdispersion and 
that the beta distribution fits the data adequately. The estimated variance component, 
under Toeplitz (1), of the concentration-repetition bioassay is 22 ) =0.00285 and 


con(@ 
the scale parameter ф = 52.281 (с). 
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Table 9.31 Fit statistics for the conditional distribution and variance components 


(a) Fit statistics CS AR(1) Toep(1) UN 

—2 Log likelihood —523.69 —523.69 —523.69 
AIC (smaller is better) —469.69 —471.69 —471.69 
AICC (smaller is better) —458.73 —461.59 —461.59 
BIC (smaller is better) —467.54 —469.62 —469.62 
CAIC (smaller is better) —440.54 —443.62 —443.62 
HQIC (smaller is better) —484.16 —485.62 —485.62 
(b) Fit statistics for conditional distribution 

—2 Log L (pct | r. effects) —529.79 
Pearson’s chi-square 177.68 
Pearson’s chi-square/DF 1.07 
(c) Covariance parameter estimates 

Cov Parm Subject Estimate Standard error 
Variance Con(Bio) 0.002849 0.004147 
Scale 52.2809 5.8849 


Table 9.32 Type III fixed Type Ш tests of fixed effects 


effects tests Effect Num DF Den DF F-value Pr>F 
Con 3 4 125.40 0.0002 
Day 5 138 10.99 <0.0001 
Day*Con 15 138 2.25 0.0074 


The fixed effects indicate that there is a highly significant effect of salt concen- 
tration (P = 0.0002), time (P = 0.0001), and the interaction concentration x time 
(P = 0.0074) on the growth inhibition of Fusarium sp. (Table 9.32). 

The linear predictors and estimated probabilities of the factors (Table 9.33 parts 
(a) and (b)) and interaction (Table 9.34) are found under the columns “Estimate” and 
“Mean,” respectively. 


9.8 Carbon Dioxide (CO;) Emission as a Function of Soil 
Moisture and Microbial Activity 


Productive agricultural soil requires a certain level of ventilation to maintain active 
plant root growth and soil microbial activity. One scientist found that soil oxygen- 
ation levels had been affected in soils fertilized with nutrient-rich sludge from a 
sewage treatment plant. The level of soil aeration can be reduced by (1) the high 
water content of the sludge added, through compaction with heavy machinery used 
to add the sludge and, ironically, (2) the increased microbial activity that occurs 
when sludge with high organic matter content is added. The objective of the research 
was to determine the moisture levels at which aeration becomes a limiting factor for 
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Table 9.33 Concentration and measurement time least square means on the model scale (Estimate) 
and the data scale (Mean) 


(a) Conc least squares means 


Estimate Standard Mean Standard error 

Con |Ñ; error DF |rvale |Рг> й |Z; mean 
0 —3.5438 0.1499 4 —23.64 | <0.0001 | 0.02809 | 0.004093 
500 |-1.0650 0.05941 4 —17.93 | «0.0001 |0.2563 |0.01133 
1000 | —0.9847 0.05895 4 —16.70 | «0.0001 |0.2720 | 0.01167 
2000 | —0.4487 0.05891 4 —7.62 0.0016 |0.3897 | 0.01401 
(b) Рау least squares means 

Estimate Standard Mean Standard error 
Day |7, error DF |tvale |Рг>Ш |7, mean 
1 —1.6017 0.1161 138 |-13.79 | «0.0001 |0.1677 0.01621 
2 - 1.0446 0.08689 138 |-12.02 | «0.0001 | 0.2603 |0.01673 
3 - 1.2475 0.08794 138 | —14.19 | «0.0001 | 0.2231 |0.01524 
4 - 1.5668 0.1020 138 |-15.36 | <0.0001 |0.1727 |0.01457 
5 —1.7606 0.1039 138 |—16.94 | «0.0001 |0.1467 0.01301 
6 - 1.8422 0.1067 138 | —17.26 | <0.0001 | 0.1368 |0.01260 


microbial activity іп the soil. Тһе study included а control treatment (по sludge) апа 
three treatments using sludge as a fertilizer with different moisture contents, whose 
moisture levels for the fertilized soil were 0.24, 0.26, and 0.28 kg water/kg soil. 

Soil samples were randomly assigned to the four treatments in a randomized 
completely design. Soil samples were placed in sealed containers and incubated 
under favorable conditions for microbial activity. The soil was compacted in the 
containers simulating a degree of compaction experienced in the field. Microbial 
activity, measured as an increase in СО», was used as a measure of the level of soil 
oxygenation. The CO, evolution/kilogram soil/day in each container was measured 
on 2, 4, 6, and 8 days after starting of the incubation period. Microbial activity in 
each soil sample was recorded as the percentage increase in СО» produced above the 
atmospheric level. The data are shown in Table 9.35. 

The analysis of variance table for this experiment is shown below (Table 9.36). 

Let pct;, be the percentage of СО» emission, assuming that pct;; has a beta 
distribution with a mean л and scale parameter ф, i.e., pcty,~Beta(z;,, ф). The 
linear predictor rjj, that relates the mean to the link function is given by 


Nik = 0 + aid qj (az); i — 1, .. 4-1, ...,4 k-1,2,3 


where 0 is the intercept, a; is the fixed effect of the treatment i, a(r)i is the random 
effect of treatment nested in the repetition k, assuming that alr) р ~N (o. 926) ) 7 
is the fixed effect of measurement time j, and (ат); is the interaction effect of 
treatment with measurement time. The link function is defined by logit(z;;) = nix. 


The following SAS syntax fits a GLMM on repeated measures with a beta 
distribution. 
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Table 9.34 Measuring time*salt concentration interaction on the model scale (Estimate) and the 
data scale (Mean) 


Day*con least squares means 


Estimate | Standard Mean Standard error 

Day | Соп 7; error DF |rvale |Рг>й |7; теап 

1 0 | —4.0127 | 0.4083 138 | —9.83 | <0.0001 | 0.01776 | 0.007124 
1 500 | —0.8709 | 0.1123 138 | —7.76 | <0.0001 | 0.2951 | 0.02335 

1 1000 | —0.6848 | 0.1092 138 | —6.27 | <0.0001 | 0.3352 | 0.02434 
1 2000 | —0.8382 | 0.1579 138 | —5.31 | <0.0001 | 0.3019 | 0.03328 
2 0 | —3.5743 | 0.2957 138 |-12.09 | <0.0001 | 0.02727 | 0.007844 
2 500 | —0.4140 | 0.1061 138 | -3.90 0.0001 |0.3980 | 0.02543 
2 1000 | —0.3519 | 0.1053 138 | -3.34 0.0011 |0.4129 0.02554 
2 2000 0.1616 | 0.1043 138 1.55 0.1235 |0.5403 | 0.02590 
3 0 |—2.9511 | 0.2944 138 | —10.02 | <0.0001 | 0.04969 | 0.01390 
3 500 | —0.9923 | 0.1149 138 | —8.64 | <0.0001 | 0.2705 0.02266 
3 1000 | —0.9044 | 0.1131 138 | —8.00 | <0.0001 | 0.2881 | 0.02319 
3 2000 | —0.1423 | 0.1041 138 | —1.37 0.1739 |0.4645 | 0.02590 
4 0 | -3.5167 | 0.3558 138 | —9.88 | <0.0001 | 0.02884 | 0.009967 
4 500 | -1.2429 | 0.1213 138 | —10.25 | <0.0001 | 0.2239 | 0.02108 
4 1000 | —1.0361 | 0.1159 138 | —8.94 | <0.0001 | 0.2619 | 0.02241 
4 2000 | —0.4716 | 0.1065 138 | —4.43 | <0.0001 | 0.3842 | 0.02520 
5 0 | —3.5503 | 0.3579 138 | —9.92 | <0.0001 | 0.02791 | 0.009710 
5 500 | —1.4180 | 0.1269 138 | —11.17 | <0.0001 | 0.1950 | 0.01992 
5 1000 | —1.4489 | 0.1277 138 | —11.34 | <0.0001 | 0.1902 | 0.01967 
5 2000 | —0.6251 | 0.1083 138 | —5.77 | <0.0001 | 0.3486 | 0.02458 
6 0 | -3.6579 | 0.3691 138 | —9.91 | <0.0001 | 0.02514 | 0.009046 
6 500 | —1.4522 | 0.1277 138 | —11.37 | <0.0001 | 0.1897 | 0.01963 
6 1000 | —1.4823 | 0.1289 138 | —11.50 | <0.0001 | 0.1851 | 0.01944 
6 2000 | —0.7765 | 0.1106 138 | —7.02 | <0.0001 | 0.3151 | 0.02388 


proc glimmix data=co2 method=laplace; 
class trt container time; 

model pct = trt|time/dist=beta link=logit; 
random trt/subject=container; 

lsmeans trt|time /lines ilink; 

run; 


Part of the results is shown below. The fit statistics under different covariance 
structures (Table 9.37 part (а)), such as AIC and AICC indicate that а Toeplitz-type 
covariance structure of order 1 provides the best fit to the dataset of this experiment. 

Table 9.38 part (a) shows the estimated variance component due to treatment x 
repetition, 1.6., — 856) — 0.03363, and the estimated scale parameter $ = 790.82, and 
the hypothesis test (part (b)) indicates that the treatments yielded statistically differ- 
ent means (P = 0.0011). 
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Table 9.35 Repeated measurements of emissions of СО» by bacterial activity in soil under 


different moisture conditions 


%СО» evolution/kilogram soil/day 
Moisture (kg water/kg soil) Container Day 2 Day 4 Day 6 Day 8 
Control 1 0.22 0.56 0.66 0.89 
2 0.68 0.91 1.06 0.8 
3 0.68 0.45 0.72 0.89 
0.24 1 2.53 2.7 2.1 1.5 
2 2.59 1.43 1.35 0.74 
3 0.56 1.37 1.87 1.21 
0.26 1 0.22 0.22 0.2 0.11 
2, 0.45 0.28 1.24 0.86 
3 0.22 0.33 0.34 0.2 
0.28 1 0.22 0.8 0.8 0.37 
2 0.22 0.62 0.89 0.95 
3 0.22 0.56 0.69 0.63 


Table 9.36 Analysis of уагі- 
ance of an RCD with repeated 
measures 


Sources of variation 


Treatment 


Degrees of freedom 
(a—1)=4-1=3 


Error, 


ar—1)—8 


Measurement time 


@-0=4-1=3 


Treatment x time 


(a — 1)@ — 1) = 9 


Error 
Total 


a(t — D(r — D 24x3x2—24 
ахіх:-і-4х4х3-1-47 


Table 9.37 Fit statistics of the beta GLMM under different covariance structures 


(a) Fit statistics CS АЕ(1) Тоер(1) UN 

—2 Log likelihood —433.28 —433.94 —433.28 No converge 
AIC (smaller is better) —395.28 —395.94 —397.28 

AICC (smaller is better) —368.14 —368.80 —373.69 

BIC (smaller is better) —412.41 —413.07 —413.50 

CAIC (smaller is better) —393.41 —394.07 —395.50 

HQIC (smaller is better) —429.71 —430.37 —429.89 

(b) Fit statistics for conditional distribution | CS AR(1) Toep(1) UN 

—2 Log L (y | r. effects) —446.54 | —444.41 | —446.58 | No converge 
Pearson's chi-square 30.46 33.98 30.38 

Pearson's chi-square/DF 0.63 0.71 0.63 


Table 9.39 shows the estimated average emissions of CO, in tested treatments, 
which showed that the treatment with moisture 0.24 kg water/kg soil favored a 
higher microbial activity, whereas treatments with moisture levels 0.26 and 0.28 kg 
water/kg soil showed similar microbial activity between them. 
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Table 9.38 Variance compo- (4) Covariance parameter estimates 


mony and Hise ett ny test Cov Parm Subject Estimate Standard error 
Variance Contenedor 0.03363 0.03153 
Scale (ф) 790.82 190.89 
(b) Type III tests of fixed effects 
Effect Num DF Den DF F-value Рг> Е 
Trt 3 8 15.52 0.0011 
Time 3 24 2.94 0.0537 
Trt*time 9 24 1.29 0.2914 


Table 9.39 Means and standard errors on the model scale (Estimate) and the data scale (Mean) 


(a) Trt least squares means 


Standard Standard error 
Trt Estimate | error DF |t-value | Pr > ll Mean mean 
C —4.9242 | 0.1595 8 —30.87 | <0.0001 | 0.007216 | 0.001143 
T0.24 | —4.1331 | 0.1343 8 —30.79 | «0.0001 |0.01578 | 0.002085 
T0.26 | —5.5728 | 0.1898 8 —29.36 | «0.0001 | 0.003786 | 0.000716 
T0.28 | —5.1588 | 0.1728 8 —29.86 | <0.0001 | 0.005716 | 0.000982 
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Fig. 9.2 СО» emission as a measure of microbial activity 


Figure 9.2 clearly shows that the treatment with moisture 0.24 kg water/kg soil 
provides the best conditions for soil microbial activity, whereas the rest of the 


treatments significantly affect the activity of microorganisms. 
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99 Effect of Soil Compaction and Soil Moisture 
on Microbial Activity 


A soil scientist conducted an experiment to evaluate the effects of soil compaction 
and soil moisture on microbial activity. Ventilation levels may be restricted in highly 
saturated or compacted soils, thus reducing microbial activity. The experiment 
consisted of three levels of soil compaction (1.1, 1.4, and 1.6 mg soil/m?) and 
three levels of soil moisture (0.1, 0.2, and 0.24 kg water/kg soil). The treated soil 
samples were placed in sealed containers and incubated under conditions to micro- 
bial activity. The percentage increase in СО» produced above atmospheric levels was 
measured in each soil sample. The experimental design was a completely random- 
ized design (CRD) with a 3 X 3 factorial structure of treatments. Two replicates of 
the soil container units were prepared for each treatment. The evolution of CO,/kg 
soil/day was measured for three successive days. The data from this experiment are 
shown below in Table 9.40. 

The analysis of variance table for this experiment is shown below (Table 9.41). 

Let pct;;, be the percentage of CO; emission and assume that pct;; has a beta 
distribution with a mean л and scale parameter ф, i.e., pct;;—Beta(z;;, ф). The 
linear predictor Их that relates the mean to the link function is given by 


"iy = 0 Е а; 4 В; | (ap); | afr); ть + (at), 4 (Br) i | (ойт) 


Table 9.40 Percentage of Density | Humidity | Replication |Day 1 Day2 | Day 3 
CO, by bacterial activity ағза eee 
function of soil density it լ 1 2.7 oS 0.11 
(mg soil/m?) and soil humidity 2 2.9 157 |125 
(kg water/kg soil) 0.2 1 5.2 5.04 3.7 
2 3.6 3.92 2.69 
0.24 1 4 3.47 3.47 
2 4.1 3.47 2.46 
1.4 0.1 1 2.6 1.12 0.9 
2 2.2 0.78 0.34 
0.2 1 4.3 3.36 3.02 
2 3.9 2.91 2.35 
0.24 1 1.9 3.02 2.58 
2 3 3.81 2.69 
1.6 0.1 1 2 0.67 0.22 
2 3 0.78 0.22 
0.2 1 3.8 2.8 2.02 
2 2.6 3.14 2.46 
0.24 1 1.3 2.69 2.46 
2 0.5 0.34 
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Table 9.41 Analysis of уапапсе of an CRD 


measures 
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with factorial structure of treatments in repeated 


Sources of variation 


Degrees of freedom 


Treatment (а.-І)-3-1-?2 
Humidity Փ-ՍՀՅ-1Հ2 
Treatment*humidity (a – 1(b — 1) = 4 
Error, ab(r—1)—-3x3x129 
Time (с– 1) =3-1=2 


Treatment time 


(а-1Хс-1)-4 


Humidity*time (b— (с = 1) = 4 

Treat*hum*time (a — D(b — IXc — 1)= 8 

Error? Idiferenciall7 

Total ахьхсхү-1-393х3х3х2-1-1-52 


Note: Неге, 1 degree of freedom was subtracted from the total observations of the experiment since 
there is a missing observation 


і- 1,2,3, J=1,2;3, Е--1,2,3, 1-1,2 


where 0 is the intercept, а; is the fixed effect of the density factor, р; is the fixed effect 
of the humidity factor, (а); is the effect of the interaction between density and 
humidity, af(r)jg, is the random effect of the interaction density x humidity х 


repetition АО? CN (o. O25) ті is the fixed effect of measurement time, 


(ат); is the fixed effect of the interaction between density and measurement time, 
(#т) is the fixed effect of the interaction between moisture and measurement time, 
and (арт), is the fixed effect of the interaction of density x humidity x time. The 
link function is defined by logit(z;;u) = ñu. 

The following SAS GLIMMIX syntax fits a repeated measures GLMM with a 
beta distribution. 


proc glimmix data=co2_fact nobound method=laplace; 
class density moisture rep time; 

model pct = density|humidity|time/dist=beta link=logit; 
random density*humidity/subject=rep type=toep (1); 
lsmeans density|humidity|time/lines ilink; 

run; 


Part of the results is listed below. The fit statistics (AIC and AICC) in Table 9.42 
part (a) indicate that a Toeplitz covariance structure of order | provides the best fit to 
of the data. 

The type III tests of fixed effects in Table 9.43 indicate that soil density 
(P = 0.0021), humidity (P = 0.0001), the evolution of emission over time 
(P = 0.0001), and the interaction between moisture and time of measurement 
(P = 0.0001) are statistically significant. 
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Table 9.42 Fit statistics of a beta GLMM with a factorial structure of treatments under different 
covariance structures 


(a) Fit statistics CS AR(1) Toep(1) UN 

—2 Log likelihood —413.74 —413.72 —413.72 No converge 
AIC (smaller is better) —353.74 —353.72 —355.72 

AICC (smaller is better) —269.19 —269.18 —280.07 

BIC (smaller is better) —392.94 —392.93 —393.62 

CAIC (smaller is better) —362.94 —362.93 —364.62 

HQIC (smaller is better) —435.73 -435.71 —434.98 

(b) Fit statistics for conditional distribution | CS AR(1) Toep(1) UN 

—2 Log L (y | r. effects) —413.74 | —413.72 | —413.72 | No converge 
Pearson’s chi-square 64.60 64.65 64.65 

Pearson’s chi-square/DF 1.22 1.22 1.22 


Table 9.43 Hypothesis testing of the factors under study 


Type Ш tests of fixed effects 


Effect Num DF Den DF F-value Pr> F 

Density 2 9 13.14 0.0021 
Humidity 2 9 69.66 <0.0001 
Density*humidity 4 9 3.57 0.0524 
Time 2 17 21.97 <0.0001 
Density*time 4 17 0.72 0.5904 
Humidity*time 4 17 17.85 <0.0001 
Density*humidity*time 8 17 2.12 0.0923 


The least mean squares obtained with the “Ismeans” command оп the model scale 
are shown under the “Estimate” column and the data scale under the “Mean” column 
of Table 9.44. 
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Another advantage of the GLIMMIX procedure is the ability to fit models to data 
where the distribution and/or link function varies with response variables. This is 
accomplished through the specification of DIST = BYOBS or LINK=BYOBS in 
the model definition. The dataset created below provides an example of a variable 
with a bivariate outcome. This reflects the condition and length of hospital stay for 
32 patients with herniorrhaphy. These data are taken from data provided by 
Mosteller and Tukey (1977) and reproduced in the study by Hand et al. (1994) 
(Table 9.45). 

For each patient, two responses were recorded. A binary response takes the value 
one if a patient experienced a routine recovery and the value zero if postoperative 
intensive care was required. The second response variable is a count variable that 
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Table 9.44 Means and standard errors and comparison of means (least significance difference 
(LSD)) on the model scale (Estimate) and data scale (Mean) 


(a) Density*humidity least squares means 


Standard 

Standard error 

Density | Humidity | Estimate | error DF value Рг > Ш | Mean mean 
1.1 0.1 —4.5961 |0.1614 9 —28.48 | «0.0001 | 0.009990 | 0.001596 
1.1 0.2 —3.1829 | 0.07606 |9 —41.85 | «0.0001 | 0.03981 | 0.002908 
1.1 0.24 -3.3152 |0.08060 19 —41.13 | <0.0001 | 0.03505 | 0.002726 
1.4 0.1 —4.4567 | 0.1450 9 —30.74 | <0.0001 | 0.01147 | 0.001643 
1.4 0.2 —3.3798 | 0.08333 |9 —40.56 | <0.0001 | 0.03293 | 0.002654 
1.4 0.24 —3.5363 | 0.08932 |9 —39.59 | <0.0001 | 0.02830 | 0.002456 
1.6 0.1 —4.7890 | 0.1809 9 —26.47 | <0.0001 | 0.008252 | 0.001481 
1.6 0.2 —3.5453 | 0.08972 |9 —39.52 | <0.0001 | 0.02805 | 0.002446 
1.6 0.24 —4.3213 | 0.1440 9 —30.00 | <0.0001 | 0.01311 | 0.001863 


(b) T grouping of density*humidity least squares means (a = 0.05) 


LS means with the same letter are not significantly different 


Density Humidity Estimate 

1.1 0.20 —3.1829 A 
1.1 0.24 —3.3152 B A 
1.4 0.20 —3.3798 B A 
1.4 0.24 —3.5363 B 

1.6 0.20 —3.5453 B 

1.6 0.24 —4.3213 C 
1.4 0.10 —4.4567 C 
1.1 0.10 —4.5961 C 
1.6 0.10 —4.7890 C 


measures the length of hospital stay after the surgery (in days). The binary variable 
“OKstatus” is a regressor variable that distinguishes patients according to their 
postoperative physical status (“1” implies better status), and the variable age is the 
age of the patient. 

These data can be modeled with a separate logistic model for the binary outcome 
and with a Poisson model for the count outcome. Such separate analyses would not 
take into account the correlation between the two response variables. It is reasonable 
to assume that the duration of post-surgery hospitalization is correlated and will 
depend on whether the patient requires intensive care. 

In the following analysis, the correlation between the two types of response 
variables for a patient is modeled with shared random effects (G-side). The dataset 
variable “dist” identifies the distribution for each observation. For those observations 
that follow a binary distribution, the response variable option “(event = “1 “у” 
determines which value of the binary variable is modeled as the event of interest. 
Since no "link" option is specified, the link is also chosen on an observation-by- 
observation basis as a predetermined link for the respective distribution. The fol- 
lowing GLIMMIX commands fit this dataset with two distributions: 
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Table 9.45 Hospital condition and length of stay of patients 


D Patient Age OKstatus y D Patient Age OKstatus y 

B 1 78 1 0 B 17 79 0 0 
Р 1 78 1 9 P 17 79 0 3 
B 2 60 1 0 В 18 51 1 1 
Р 2 60 1 4 Р 18 51 1 5 
В 3 68 1 1 В 19 57 1 1 
Р 3 68 1 7 Р 19 57 1 8 
B 4 62 0 1 B 20 51 0 1 
P 4 62 0 35 P 20 51 0 8 
B 5 76 0 0 B 21 48 1 1 
P 5 76 0 9 P 21 48 1 3 
B 6 76 1 1 B 22 48 1 1 
P 6 76 1 7 P 22 48 1 5 
В 7 64 1 1 В 23 66 1 1 
Р 7 64 1 5 Р 23 66 1 8 
В 8 74 1 1 В 24 71 1 0 
Р 8 74 1 16 Р 24 71 1 2 
B 9 68 0 1 B 25 75 0 0 
P 9 68 0 7 Р 25 75 0 7 
В 10 79 1 0 В 26 2 1 1 
Р 10 79 1 11 Р 26 2 1 0 
B 11 80 0 1 B 27 65 1 0 
Р 11 80 0 4 Р 27 65 1 16 
B 12 48 1 1 В 28 42 1 0 
Р 12 48 1 9 Р 28 42 1 3 
В 13 35 1 1 В 29 54 1 0 
Р 13 35 1 2 Р 29 54 1 2 
В 14 58 1 1 В 30 43 1 1 
Р 14 58 1 4 Р 30 43 1 3 
В 15 40 1 1 В 31 4 1 1 
Р 15 40 1 3 Р 31 4 1 3 
В 16 19 1 1 В 32 52 1 1 
Р 16 19 1 4 Р 32 52 1 8 


data Poi Bin; 

length dist $7; 

input d$ patient age OKstatus response 00; 

18 ՎՀ 'В' then а15Е= 'В1пагу'; else dist='Poisson'; 
datalines; 

B17810P17B8B19B25010P26014 
В368111Р36817В46201Р462035 
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Table 9.46 Model information 


Model information 


Dataset WORK.POI BIN 

Response variable Response 

Response distribution Multivariate 

Link function Multiple 

Variance function Default 

Variance matrix blocked by Patient 

Estimation technique Residual pseudo-likelihood (PL) 
Degrees of freedom method Containment 


B295410P295412B3043111P304313 
B314111P31413B3252111P325218 

proc glimmix data=joint; 

class patient dist; 

model response (event='1') = dist dist*age dist*OKstatus / 
noint s dist=byobs (dist) ; 

random int / subject=patient; 

lsmeans dist/lines ilink; 

run; 


Some of the output is shown below. Table 9.46 (“Model information") shows that 
the distribution of the data is multivariate and that possibly multiple link functions 
are involved; by default, proc. GLIMMIX uses a logit link for the binary observa- 
tions and a log link for the Poisson data. 

Table 9.47 shows the value of the distribution statistic Gener. chi — square/ 
DF = 0.90, which indicates that there is no overdispersion, and also shows the 


estimated variance component due to patient, which is, D iod — 0.299. The fixed 


effects tests for the effects of age and status are shown in part (c). 

In addition to the above results, the maximum likelihood estimators of the 
intercepts, as well as the values of the slopes of each of the variables of both 
probability distributions, are tabulated in Table 9.48. 

Thus, to calculate the probability that a patient will experience a routine recovery, 
the following expression is used: 


я 1 
Պ- — _ 
1+ exp ( — Bo — Py x age — fox okstatus } 
= 1 
ազո exp {—5-7783+0.07572 x age+0.4697 x okstatus] 
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Table 9.47 Results of the 
analysis of уапапсе 


(a) Fit statistics 


—2 Res log pseudo-likelihood 226.71 
Generalized chi-square 52.25 
Gener. chi-square/DF 0.90 


(b) Covariance parameter estimates 


Cov Parm Subject Estimate Standard error 
Intercept 0.1116 
(c) Type III tests of fixed effects 
Effect Num DF Den DF F-value Pr>F 
Dist 2 29 2.74 0.0814 
Age*dist 2 29 5.94 0.0069 
OKstatus*dist 2 29 0.24 0.7909 
Table 9.48 Maximum likelihood estimators for fixed effects 
Solutions for fixed effects 
Effect Dist Estimate Standard error DF t-value Pr > Ifl 
Dist Binary 5.7783 2.9048 29 1.99 0.0562 
Dist Poisson 0.8410 0.5696 29 1.48 0.1506 
Age*dist Binary —0.07572 0.03791 29 —2.00 0.0552 
Age*dist Poisson 0.01875 0.007383 29 2.54 0.0167 
OKstatus*dist Binary —0.4697 1.1251 29 —0.42 0.6794 
OKstatus* dist Poisson —0.1856 0.3020 29 —0.61 0.5435 


whereas the following expression is used to calculate the average value of the length 
of hospital stay after the surgery (in days): 


1- exp [ауа x age+a x okstatus } _ exp {0-8410+-0.01875 x age — 0.1856 x okstatus} 
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Exercise 9.11.1 Consider an experiment in which three treatments are compared. 
There are r blocks of n animals, each using grouping criteria relevant to the 
experiment. Within each block, one animal is randomly assigned to each treatment. 
A measurement was taken on animals at “week 0,” when treatments were applied, 
and again at weeks 4 and 12. Variables measured included weight, the presence or 
absence of disease symptoms, and severity of symptoms, classified as “worse,” “no 
change,” or “better.” The focus of this experiment was on repeated measures analysis 
of the last two types of data in the above list: categorical data that are binary or 
ordinal and ordinal responses/ratings in an experiment designed with a repeated 
measures and treatment factor structure. Regardless of whether the observations are 
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Table 9.49 Results of a repeated measures experiment with an ordinal response variable 


Week 0 Week 4 Week 12 
Response Placebo | Trt! | Trt2 | Response | Placebo | Trt! | Trt2 | Response | Placebo 
Bad 60 59 |54 14 5 3 13 10 7 
Without 7 6 13 34 33 38 25 17 21 
change 
Better 0 0 0 15 22; 17 17 |28 21 


normally distributed, categorical, ог have some other distribution, а general 
approach to repeated measures analysis based on the linear mixed model uses the 
following general form: 


Observation = systematic between — subjects variation + random between 
— subjects variation + systematic within — subjects effects + random within 
— subjects variation. 


The following table shows the data from an experiment in which each cell 
contains the number of animals in a given treatment x week x response category 
combination (Table 9.49). 


(a) List all the components of the repeated measures under a multinomial GLMM. 

(b) Study and choose the best covariance structure that models this dataset. Cite the 
most relevant results. 

(c) Fit the multinomial cumulative logit model to these data. Perform a complete and 
appropriate analysis of the data, focusing on: 


(1) An evaluation of the effects of the combination of treatments 
(ii) Odds ratio interpretation 
(iii) The expected probability per category for each treatment 


(d) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 


Repeat (b) through (d), assuming a generalized multinomial logit in Exercise 
9.11.1. Discuss your results. 

Repeat (b) through (d) assuming a multinomial cumulative probit in Exercise 
9.11.1. Discuss your results and compare with those found in (1) and (2). 

Alternatively, the contingency table approach can be implemented using a 
log-linear model. For the previous example, 9.11.1, fit the log-linear model 


log (1) = и + ti + wj 4 (тт); + ск + (тс) + (сос), 


where 4j;, is the expected count of the treatment combination ijk by week by response 
category and т, w, and c refer to treatment, week,and response category effects, 
respectively. 
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Table 9.50 Nitrogen injection treatment factors study 


Handling practices Application rate 
1 = N surface applied without additional water injection 2.5 g/cm? 
2 = N surface area applied with supplementary water injection 
3 = N injected with a number 56 nozzle (7.6 cm depth of injection) 5.0 g/cm? 
4 = N injected with a number 53 nozzle (12.7 cm depth of injection) 
Handling practices 1 Driving practice 2 
Quality Ni № Total Quality N. М» Total 
Poor 14 5 19 Роог 15 8 23 
Average 2 11 13 Average 1 8 9 
Good 0 0 0 Good 0 0 0 
Excellent 0 0 0 Excellent 0 0 0 
Total 16 16 32 Total 16 16 32 
Handling practices 3 Handling practices 4 
Quality N. N2 Total Quality Nı № Total 
Poor 0 0 0 Poor 1 0 1 
Average 9 2 11 Average 12 4 16 
Good 7 14 21 Good 0 0 0 
Excellent 0 0 0 Excellent 0 0 0 
Total 16 16 32 Total 16 16 32 


Exercise 9.11.2 Fertilization of turf has traditionally been accomplished through 
surface applications. The introduction of new equipment (Hydroject) has made it 
possible to place soluble materials below the surface (Table 9.50). 


A study was conducted during the 1997 growing season to compare surface 
application and subsoil injection of nitrogen on the green color of bentgrass 
(Agrostis palustris L. Huds) | year after transplanting. The treatment structure was 
a full factorial of grass management factors (four types/levels) and the rate/level (two 
levels) of nitrogen application per square meter (g/m?). Eight treatment combina- 
tions were arranged in a completely randomized design with four replications. Turf 
color was evaluated in each experimental unit at weekly intervals of 4 weeks as poor, 
average, good, or excellent. 

Of particular interest was the determination of the water injection effect, the 
subsurface effect, and the comparison of injection versus surface applications. These 
are contrasts between the levels of factor management practice and their primary 
objective, which was to determine whether the factor interacts with the rate of 
application. 


(a) List all the GLMM components of this experiment. 
(b) Fit the multinomial cumulative logit proportional odds model to these data. 
Perform a complete and appropriate analysis of the data, focusing on: 


(ü An evaluation of the effects of the combination of treatments 
(ii) Interpretation of the odds ratios 
(iii) The expected probability per category for each treatment 
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(c) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 


Exercise 9.11.3 Refer to Exercise 9.11.1. 


(a) Fit the multinomial generalized logit proportional odds model to these data. 
(b) List all the components of the GLMM of this experiment. 
(c) Perform a complete and appropriate analysis of the data, focusing on: 


(1) An evaluation of the effects of the combination of treatments 
(ii) Interpretation of odds ratios 
(іі) The expected probability per category for each treatment 


(d) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 


Exercise 9.11.4 Refer to Exercise 9.11.1. 


(a) List all the components of the GLMM of this experiment. 
(b) Fit the multinomial cumulative probit proportional odds model to these data. 
Perform a complete and appropriate analysis of the data, focusing on: 


(i) An evaluation of the effects of the combination of treatments 
(ii) Interpretation of the odds ratios 
(ш) The expected probability per category for each treatment 


(c) Test whether the proportional odds assumption is viable. Cite relevant evidence 
to support your conclusion regarding the adequacy of the assumption. 


Appendix 

Data: Feeding line experiment 

Tray Feeding Run Proportion Tray Feeding Run Proportion 
1 н 1 0.18217 1 н 3 0.06818 
2 н 1 0.15493 2 н 3 0.05874 
3 н 1 0.15906 3 н 3 0.05757 
4 н 1 0.15869 4 н 3 0.10349 
5 н 1 0.14891 5 н 3 0.08564 
6 н 1 0.17654 6 н 3 0.09359 
7 Н 1 0.12915 7 Н 3 0.09706 
8 н 1 0.12895 8 н 3 0.13188 
9 H 1 0.16688 9 н 3 0.18477 

10 н 1 0.11965 10 н 3 0.10966 

11 н 1 0.21719 11 н 3 0.18069 

12 н |1 0.20797 12 ІН 3 0.18182 


(continued) 
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Data: Feeding line experiment 


Ттау Feeding Run Proportion Ттау Feeding Run Proportion 
1 L 1 0.70601 1 L 3 0.75524 
2 L 1 0.68817 2 L 3 0.77249 
3 L 1 0.68317 3 L 3 р 
4 L 1 0.77805 4 L 3 0.84204 
5 L 1 0.76692 5 L 3 0.81572 
6 L 1 0.79127 6 L 3 0.79161 
7 L 1 0.73653 7 L 3 0.81234 
8 L 1 0.74939 8 L 3 0.81795 
9 L 1 0.78773 9 L 3 0.8225 

10 L 1 0.7381 10 L 3 0.79384 

11 L 1 0.88486 11 L 3 0.8135 

12 L 1 0.90401 12 L 3 0.83965 
1 H 2 0.07547 1 H 4 0.07105 
2 H 2 0.05801 2 H 4 0.05511 
3 H 2 0.0565 3 H 4 0.05217 
4 H 2 0.09579 4 н 4 0.10567 
5 н 2 0.10954 5 н 4 0.0755 
6 н 2 0.12154 6 н 4 0.0853 
7 н 2 0.1144 մ ՒԼ 4 0.09363 
8 н 2 0.13728 8 н 4 0.11154 
9 н 2 0.15012 9 H 4 0.16264 

10 H 2 0.12113 10 H 4 0.09215 

11 H 2 0.17633 11 н 4 0.1834 

12 н 2 0.16408 12 H 4 0.21016 
1 L 2 0.78318 1 L 4 0.76556 
2 L 2 0.78418 2 L 4 0.78307 
3 L 2 0.78589 3 L 4 0.76486 
4 L 2 0.78867 4 Т, 4 0.82391 
Э L 2 0.81988 5 L 4 0.8044 
6 L 2 0.82793 6 L 4 0.81178 
7 L 2 0.81384 7 L 4 0.84339 
8 L 2 0.81037 8 L 4 0.78833 
9 L 2 0.77528 9 L 4 0.79804 

10 L 2 0.78916 10 L 4 0.82236 

11 L 2 0.87109 11 L 4 0.83807 

12 L 2 0.84704 12 L 4 0.85532 
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Appendix 421 
Data: Percentage inhibition (Bio bioassay, Con concentration, Rep repetition, Por percentage 
inhibition) 

Bio Day Con Rep Por Bio Day Con Rep Por 

1 1 1 6 4 35.1724 
1 1 4 5.2632 2 1 2 0.0016 
1 1 1 15.7895 2 1 3 14.2857 
1 1 2 26.3158 2 1 1 42.8571 
1 1 3 15.7895 2 1 2 42.8571 
1 1 4 15.7895 2 1 3 42.8571 
1 1 1 36.8421 2 1 4 42.8571 
1 1 2. 36.8421 2, 1 1 7.1429 
1 1 3 36.8421 2 1 2 42.8571 
1 1 4 36.8421 2 1 3 42.8571 
1 1 1 15.7895 2 1 4 42.8571 
1 1 2 36.8421 2 2 1 1.3699 
1 1 3 36.8421 2 2 2 1.3699 
1 1 4 36.8421 2 2 4 1.3699 
1 2 2 1.9355 2 2 1 34.2466 
1 2 3 4.5161 2 2 2 31.5068 
1 2 4 1.9355 2 2 3 42.4658 
1 2 1 43.2258 2 2 4 36.9863 
1 2 2 48.3871 2 2 1 34.2466 
1 2 3 40.6452 2; 2 2 47.9452 
1 2 4 40.6452 2 2 3 45.2055 
1 2 1 35.4839 2 2 4 45.2055 
1 2 2 45.8065 2 2 1 47.9452 
1 2 3 43.2258 2 2 2 53.4247 
1 2 4 32.9032 2 2 3 50.6849 
1 2 1 58.7097 2 2 4 56.1644 
1 2 2: 53.5484 2 3 1 4.2735 
1 2 3 53.5484 2 3 4 14.5299 
1 2 2000 4 58.7097 2 3 500 1 28.2051 
1 3 0 2 1.2346 2 3 500 2 28.2051 
1 3 0 3 3.7037 2 3 500 3 35.0427 
1 3 500 1 25.9259 2 3 500 4 24.7863 
1 3 500 2 23.4568 2 3 1000 1 24.7863 
1 3 500 3 23.4568 2 3 1000 2 35.0427 
1 3 500 4 24.6914 2 3 1000 3 24.7863 
1 3 1000 1 30.8642 2 3 1000 4 26.4957 
1 3 1000 2 32.0988 2 3 2000 1 40.1709 
1 3 1000 3 28.3951 2 3 2000 2 38.4615 
1 3 1000 4 25.9259 2 3 2000 3 47.0085 
1 3 2000 1 53.0864 2 3 2000 4 41.8803 
1 3 2000 2; 49.3827 2 4 0 2 1.5015 
1 3 2000 3 49.3827 2 4 0 3 1.5015 


(continued) 
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Data: Percentage inhibition (Bio bioassay, Con concentration, Rep repetition, Por percentage 
inhibition) 


Bio 


Day 


Rep 


Por 


Bio 


Day 


Rep 


Por 


1 3 2000 4 51.8519 2 4 0 4 1.5015 
1 4 0 3 4.6729 2 4 500 1 20.7207 
1 4 500 1 19.6262 2: 4 500 2 23.1231 
1 4 500 2 20.5607 2 4 500 3 27.9279 
1 4 500 3 22.4299 2 4 500 4 20.7207 
1 4 500 4 20.5607 2 4 1000 1 35.1351 
1 4 1000 1 21.4953 2 4 1000 2 26.7267 
1 4 1000 2 21.4953 2 4 1000 3 26.7267 
1 4 1000 3 23.3645 2 4 1000 4 32.7327 
1 4 1000 4 20.5607 2 4 2000 1 33.9339 
1 4 2000 1 42.0561 2 4 2000 2 37.5375 
1 4 2000 2 36.4486 2 4 2000 3 44.7447 
1 4 3 32.7103 2 4 4 38.7387 
1 4 4 40.1869 2 5 2 2.008 

1 5 3 4.065 2 5 4 0.4016 
1 2 4 4.065 2 5 1 13.253 

1 2 1 21.1382 2 5 2 21.2851 
1 5 2 24.3902 2 5 3 21.2851 
1 5 3 17.0732 2 5 4 18.0723 
1 5 4 17.0732 2 5 1 21.2851 
1 5 1 18.6992 2 5 2 18.0723 
1 5 2 18.6992 2 5 3 16.4659 
1 5 3 20.3252 2 5 4 16.4659 
1 5 4 17.8862 2 5 1 35.743 

1 5 1 41.4634 2 5 2 34.1365 
1 5 2 38.2114 2 5 3 29.3173 
1 5 3 34.1463 2 5 4 30.9237 
1 5 4 33.3333 2 6 2 4.2159 
1 6 3 4.8276 2 6 4 0.1686 
1 6 4 2.069 2 6 1 18.3811 
1 6 1 17.2414 2; 6 2 20.4047 
1 6 2 18.6207 2 6 3 22.4283 
1 6 3 16.5517 2 6 4 20.4047 
1 6 4 13.7931 2 6 1 21.0793 
1 6 1 15.8621 2 6 2 17.7066 
1 6 2 16.5517 2 6 3 17.7066 
1 6 3 15.8621 2 6 4 20.4047 
1 6 4 18.6207 2 6 1 31.1973 
1 6 1 32.4138 2 6 2 29.1737 
1 6 2000 2 29.6552 2 6 2000 3 29.8482 
1 6 2000 3 31.7241 2 6 2000 4 30.5228 
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